Thread: can we mark upper/lower/textlike functions leakproof?

can we mark upper/lower/textlike functions leakproof?

From
Andrew Dunstan
Date:
Several years ago, an EDB customer complained that if they used a 
functional index involving upper(), lower(), or textlike(), where RLS 
was involved, the indexes were not used because these functions are not 
marked leakproof. We presented the customer with several options for 
addressing the problem, the simplest of which was simply to mark the 
functions as leakproof, and this was the solution they adopted.

The consensus of discussion at the time among our senior developers was 
that there was probably no reason why at least upper() and lower() 
should not be marked leakproof, and quite possibly initcap() and 
textlike() also. It was suggested that we had not been terribly rigorous 
in assessing whether or not functions can be considered leakproof.

At the time we should have asked the community about it, but we didn't.

Fast forward to now. The customer has found no observable ill effects of 
marking these functions leakproof. The would like to know if there is 
any reason why we can't mark them leakproof, so that they don't have to 
do this in every database of every cluster they use.

Thoughts?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: can we mark upper/lower/textlike functions leakproof?

From
David Rowley
Date:
On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan <andrew@dunslane.net> wrote:
> Fast forward to now. The customer has found no observable ill effects of
> marking these functions leakproof. The would like to know if there is
> any reason why we can't mark them leakproof, so that they don't have to
> do this in every database of every cluster they use.
>
> Thoughts?

According to [1], it's just not been done yet due to concerns about
risk to reward ratios.  Nobody mentioned any reason why it couldn't
be, but there were some fears that future code changes could yield new
failure paths.

David

[1] https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Andrew Dunstan
Date:


On 2024-07-30 Tu 6:51 PM, David Rowley wrote:
On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan <andrew@dunslane.net> wrote:
Fast forward to now. The customer has found no observable ill effects of
marking these functions leakproof. The would like to know if there is
any reason why we can't mark them leakproof, so that they don't have to
do this in every database of every cluster they use.

Thoughts?
According to [1], it's just not been done yet due to concerns about
risk to reward ratios.  Nobody mentioned any reason why it couldn't
be, but there were some fears that future code changes could yield new
failure paths.

David

[1] https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com


Hmm, somehow I missed that thread in searching, and clearly I'd forgotten it.

Still, I'm not terribly convinced by arguments along the lines you're suggesting. "Sufficient unto the day is the evil thereof." Maybe we need a test to make sure we don't make changes along those lines, although I have no idea what such a test would look like.


cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 7/31/24 05:47, Andrew Dunstan wrote:
> On 2024-07-30 Tu 6:51 PM, David Rowley wrote:
>> On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan<andrew@dunslane.net> wrote:
>>> Fast forward to now. The customer has found no observable ill effects of
>>> marking these functions leakproof. The would like to know if there is
>>> any reason why we can't mark them leakproof, so that they don't have to
>>> do this in every database of every cluster they use.
>>>
>>> Thoughts?
>> According to [1], it's just not been done yet due to concerns about
>> risk to reward ratios.  Nobody mentioned any reason why it couldn't
>> be, but there were some fears that future code changes could yield new
>> failure paths.
>>
>> David
>>
>> [1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com
> 
> Hmm, somehow I missed that thread in searching, and clearly I'd 
> forgotten it.
> 
> Still, I'm not terribly convinced by arguments along the lines you're 
> suggesting. "Sufficient unto the day is the evil thereof." Maybe we need 
> a test to make sure we don't make changes along those lines, although I 
> have no idea what such a test would look like.


I think I have expressed this opinion before (which was shot down), and 
I will grant that it is hand-wavy, but I will give it another try.

In my opinion, for this use case and others, it should be possible to 
redact the values substituted into log messages based on some criteria. 
One of those criteria could be "I am in a leakproof call right now". In 
fact in a similar fashion, an extension ought to be able to mutate the 
log message based on the entire string, e.g. when "ALTER 
ROLE...PASSWORD..." is spotted I would like to be able to redact 
everything after "PASSWORD".

Yes it might render the error message unhelpful, but I know of users 
that would accept that tradeoff in order to get better performance and 
security on their production workloads. Or in some cases (e.g. PASSWORD) 
just better security.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Andrew Dunstan
Date:
On 2024-07-31 We 9:14 AM, Joe Conway wrote:
> On 7/31/24 05:47, Andrew Dunstan wrote:
>> On 2024-07-30 Tu 6:51 PM, David Rowley wrote:
>>> On Wed, 31 Jul 2024 at 09:35, Andrew Dunstan<andrew@dunslane.net> 
>>> wrote:
>>>> Fast forward to now. The customer has found no observable ill 
>>>> effects of
>>>> marking these functions leakproof. The would like to know if there is
>>>> any reason why we can't mark them leakproof, so that they don't 
>>>> have to
>>>> do this in every database of every cluster they use.
>>>>
>>>> Thoughts?
>>> According to [1], it's just not been done yet due to concerns about
>>> risk to reward ratios.  Nobody mentioned any reason why it couldn't
>>> be, but there were some fears that future code changes could yield new
>>> failure paths.
>>>
>>> David
>>>
>>> [1]https://postgr.es/m/02BDFCCF-BDBB-4658-9717-4D95F9A91561%40thebuild.com 
>>>
>>
>> Hmm, somehow I missed that thread in searching, and clearly I'd 
>> forgotten it.
>>
>> Still, I'm not terribly convinced by arguments along the lines you're 
>> suggesting. "Sufficient unto the day is the evil thereof." Maybe we 
>> need a test to make sure we don't make changes along those lines, 
>> although I have no idea what such a test would look like.
>
>
> I think I have expressed this opinion before (which was shot down), 
> and I will grant that it is hand-wavy, but I will give it another try.
>
> In my opinion, for this use case and others, it should be possible to 
> redact the values substituted into log messages based on some 
> criteria. One of those criteria could be "I am in a leakproof call 
> right now". In fact in a similar fashion, an extension ought to be 
> able to mutate the log message based on the entire string, e.g. when 
> "ALTER ROLE...PASSWORD..." is spotted I would like to be able to 
> redact everything after "PASSWORD".
>
> Yes it might render the error message unhelpful, but I know of users 
> that would accept that tradeoff in order to get better performance and 
> security on their production workloads. Or in some cases (e.g. 
> PASSWORD) just better security.
>
--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Wed, Jul 31, 2024 at 9:14 AM Joe Conway <mail@joeconway.com> wrote:
> In my opinion, for this use case and others, it should be possible to
> redact the values substituted into log messages based on some criteria.
> One of those criteria could be "I am in a leakproof call right now". In
> fact in a similar fashion, an extension ought to be able to mutate the
> log message based on the entire string, e.g. when "ALTER
> ROLE...PASSWORD..." is spotted I would like to be able to redact
> everything after "PASSWORD".

This might be helpful, and unfortunately I'm all too familiar with the
ALTER ROLE...PASSWORD example, but I think it's not really related to
the question of whether we can mark upper() and lower() leakproof.

If there are some inputs that cause upper() and lower() to fail and
others that do not, the functions aren't leakproof, because an
attacker can extract information about values that they can't see by
feeding those values into these functions and seeing whether they get
a failure or not. It doesn't matter what error message is produced;
the fact that the function throws an error of any kind for some input
values but enough for others is enough to make it unsafe, and it seems
to me that we've repeatedly found that there's often a way to turn
even what seems like a small leak into a very large one, so we need to
be quite careful here.

Annoyingly, we have a WHOLE BUNCH of different code paths that need to
be assessed individually here. I think we can ignore the fact that
upper() can throw errors when it's unclear which collation to use; I
think that's a property of the query string rather than the input
value. It's a bit less clear whether we can ignore out of memory
conditions, but my judgement is that a possible OOM from a small
allocation is not generally going to be useful as an attack vector. If
a long string produces an error and a short one doesn't, that might
qualify as a real leak. And other errors that depend on the input
value are also leaks. So let's go through the code paths:

- When lc_ctype_is_c(), we call asc_toupper(), which calls
pg_ascii_toupper(), which just replaces a-z with A-Z. No errors.

- When mylocale->provider == COLLPROVIDER_ICU, we call icu_to_uchar()
and icu_convert_case(). icu_to_uchar() calls uchar_length(), which has
this:

        if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR)
                ereport(ERROR,
                                (errmsg("%s failed: %s",
"ucnv_toUChars", u_errorName(status))));

I don't know what errors can be reported here, or if this ever fails
in practice, so I can't prove it never fails consistently for some
particular input string. uchar_convert() and icu_convert_case() have
similar error reporting.

- When mylocale->provider == COLLPROVIDER_BUILTIN, we call
unicode_strupper() which calls unicode_to_utf8() which clearly throws
no errors. Likewise unicode_utf8len() does not error. I don't see how
we can get an error out of this path.

- In cases not covered by the above, we take different paths depending
on whether a multi-byte encoding is in use. In single-byte encodings,
we rely on either pg_toupper() or toupper_l(). The latter is an
OS-provided function so can't ereport(), and the former calls either
does the work itself or calls toupper() which again is an OS-provided
function and can't report().

- Finally, when mylocale == NULL or the provider is COLLPROVIDER_LIBC
and the encoding is multi-byte, we use char2wchar(), then towupper_l()
or towupper(), then wchar2char(). The whole thing can fall apart if
the string is too long, which might be enough to declare this
leakproof but it depends on whether the error guarded by /* Overflow
paranoia */ is really just paranoia or whether it's actually
reachable. Otherwise, towupper*() won't ereport because it's not part
of PG, so we need to assess char2wchar() and wchar2char(). Here I note
that char2wchar() claims that it does an ereport() if the input is
invalidly encoded. This is kind of surprising when you think about it,
because our usual policy is not to allow invalidly encoded data into
the database in the first place, but whoever wrote this thought it
would be possible to hit this case "if LC_CTYPE locale is different
from the database encoding." But it seems that the logic here dates to
commit 2ab0796d7a3a7116a79b65531fd33f1548514b52 back in 2011, so it
seems at least possible to me that we've tightened things up enough
since then that you can't actually hit this any more. But then again,
maybe not.

So in summary, I think upper() is ... pretty close to leakproof. But
if ICU sometimes fails on certain strings, then it isn't. And if the
multi-byte libc path can be made to fail reliably either with really
long strings or with certain choices of the LC_CTYPE locale, then it
isn't.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> If there are some inputs that cause upper() and lower() to fail and
> others that do not, the functions aren't leakproof, because an
> attacker can extract information about values that they can't see by
> feeding those values into these functions and seeing whether they get
> a failure or not.

> [ rather exhaustive analysis redacted ]

> So in summary, I think upper() is ... pretty close to leakproof. But
> if ICU sometimes fails on certain strings, then it isn't. And if the
> multi-byte libc path can be made to fail reliably either with really
> long strings or with certain choices of the LC_CTYPE locale, then it
> isn't.

The problem here is that marking these functions leakproof is a
promise about a *whole bunch* of code, much of it not under our
control; worse, there's no reason to think all that code is stable.
A large fraction of it didn't even exist a few versions ago.

Even if we could convince ourselves that the possible issues Robert
mentions aren't real at the moment, I think marking these leakproof
is mighty risky.  It's unlikely we'd remember to revisit the marking
the next time someone drops a bunch of new code in here.

            regards, tom lane



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 7/31/24 14:03, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> If there are some inputs that cause upper() and lower() to fail and
>> others that do not, the functions aren't leakproof, because an
>> attacker can extract information about values that they can't see by
>> feeding those values into these functions and seeing whether they get
>> a failure or not.
> 
>> [ rather exhaustive analysis redacted ]
> 
>> So in summary, I think upper() is ... pretty close to leakproof. But
>> if ICU sometimes fails on certain strings, then it isn't. And if the
>> multi-byte libc path can be made to fail reliably either with really
>> long strings or with certain choices of the LC_CTYPE locale, then it
>> isn't.
> 
> The problem here is that marking these functions leakproof is a
> promise about a *whole bunch* of code, much of it not under our
> control; worse, there's no reason to think all that code is stable.
> A large fraction of it didn't even exist a few versions ago.
> 
> Even if we could convince ourselves that the possible issues Robert
> mentions aren't real at the moment, I think marking these leakproof
> is mighty risky.  It's unlikely we'd remember to revisit the marking
> the next time someone drops a bunch of new code in here.


I still maintain that there is a whole host of users that would accept 
the risk of side channel attacks via existence of an error or not, if 
they could only be sure nothing sensitive leaks directly into the logs 
or to the clients. We should give them that choice.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Andrew Dunstan
Date:
On 2024-07-31 We 2:43 PM, Joe Conway wrote:
> On 7/31/24 14:03, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> If there are some inputs that cause upper() and lower() to fail and
>>> others that do not, the functions aren't leakproof, because an
>>> attacker can extract information about values that they can't see by
>>> feeding those values into these functions and seeing whether they get
>>> a failure or not.
>>
>>> [ rather exhaustive analysis redacted ]


First, thanks you very much, Robert for the analysis.


>>
>>> So in summary, I think upper() is ... pretty close to leakproof. But
>>> if ICU sometimes fails on certain strings, then it isn't. And if the
>>> multi-byte libc path can be made to fail reliably either with really
>>> long strings or with certain choices of the LC_CTYPE locale, then it
>>> isn't.
>>
>> The problem here is that marking these functions leakproof is a
>> promise about a *whole bunch* of code, much of it not under our
>> control; worse, there's no reason to think all that code is stable.
>> A large fraction of it didn't even exist a few versions ago.
>>
>> Even if we could convince ourselves that the possible issues Robert
>> mentions aren't real at the moment, I think marking these leakproof
>> is mighty risky.  It's unlikely we'd remember to revisit the marking
>> the next time someone drops a bunch of new code in here.
>
>
> I still maintain that there is a whole host of users that would accept 
> the risk of side channel attacks via existence of an error or not, if 
> they could only be sure nothing sensitive leaks directly into the logs 
> or to the clients. We should give them that choice.
>

As I meant to say in my previous empty reply, I think your suggestions 
make lots of sense.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Wed, Jul 31, 2024 at 2:43 PM Joe Conway <mail@joeconway.com> wrote:
> I still maintain that there is a whole host of users that would accept
> the risk of side channel attacks via existence of an error or not, if
> they could only be sure nothing sensitive leaks directly into the logs
> or to the clients. We should give them that choice.

I'm not sure what design you have in mind. A lot of possible designs
seem to end up like this:

1. You can't directly select the invisible value.

2. But you can write a plpgsql procedure that tries a bunch of things
in a loop and catches errors and uses which things error and which
things don't to figure out and return the invisible value.

And I would argue that's not really that useful. Especially if that
plpgsql procedure can extract the hidden values in like 1ms/row.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Wed, Jul 31, 2024 at 2:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The problem here is that marking these functions leakproof is a
> promise about a *whole bunch* of code, much of it not under our
> control; worse, there's no reason to think all that code is stable.
> A large fraction of it didn't even exist a few versions ago.

I think it's a fair critique. It could be reasonable to say, well, a
particular user could reasonably judge that the risk of marking
upper() as leak-proof is acceptable, but it's a little too risky for
us to want to do it as a project. After all, they can know things like
"we don't even use ICU," which we as a project cannot know.

However, the risk is that an end-user is going to be much less able to
evaluate what is and isn't safe than we are. I think some people are
going to be like -- well the core project doesn't mark enough stuff
leakproof, so I'll just go add markings to a bunch of stuff myself.
And they probably won't stop at stuff like UPPER which is almost
leakproof. They might add it to stuff such as LIKE which results in
immediately giving away the farm. By not giving people any guidance,
we invite them to make up their own rules.

Perhaps it's still right to be conservative; after all, no matter what
an end-user does, it's not a CVE for us, and CVEs suck. On the other
hand, shipping a system that is not useful in practice until you
modify a bunch of stuff also sucks, especially when it's not at all
clear which things are safe to modify.

I'm not sure what the right thing to do here is, but I think that it's
wrong to imagine that being unwilling to endorse probably-leakproof
things as leakproof -- or unwilling to put in the work to MAKE them
leakproof if they currently aren't -- has no security costs.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 7/31/24 16:10, Robert Haas wrote:
> On Wed, Jul 31, 2024 at 2:43 PM Joe Conway <mail@joeconway.com> wrote:
>> I still maintain that there is a whole host of users that would accept
>> the risk of side channel attacks via existence of an error or not, if
>> they could only be sure nothing sensitive leaks directly into the logs
>> or to the clients. We should give them that choice.
> 
> I'm not sure what design you have in mind. A lot of possible designs
> seem to end up like this:
> 
> 1. You can't directly select the invisible value.
> 
> 2. But you can write a plpgsql procedure that tries a bunch of things
> in a loop and catches errors and uses which things error and which
> things don't to figure out and return the invisible value.
> 
> And I would argue that's not really that useful. Especially if that
> plpgsql procedure can extract the hidden values in like 1ms/row.


You are assuming that everyone allows direct logins with the ability to 
create procedures. Plenty don't.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not sure what the right thing to do here is, but I think that it's
> wrong to imagine that being unwilling to endorse probably-leakproof
> things as leakproof -- or unwilling to put in the work to MAKE them
> leakproof if they currently aren't -- has no security costs.

Well, we *have* been a little bit spongy about that --- notably,
that texteq and friends are marked leakproof.  But IMV, marking
upper/lower as leakproof is substantially riskier and offers
substantially less benefit than those did.

In general, I'm worried about a slippery slope here.  If we
start marking things as leakproof because we cannot prove
they leak, rather than because we can prove they don't,
we are eventually going to find ourselves in a very bad place.

            regards, tom lane



Re: can we mark upper/lower/textlike functions leakproof?

From
Laurenz Albe
Date:
On Wed, 2024-07-31 at 14:43 -0400, Joe Conway wrote:
> I still maintain that there is a whole host of users that would accept
> the risk of side channel attacks via existence of an error or not, if
> they could only be sure nothing sensitive leaks directly into the logs
> or to the clients. We should give them that choice.

I think that you are right.

But what do you tell the users who would not accept that risk?

Yours,
Laurenz Albe



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Wed, Jul 31, 2024 at 4:42 PM Joe Conway <mail@joeconway.com> wrote:
> You are assuming that everyone allows direct logins with the ability to
> create procedures. Plenty don't.

Well, if you can send queries, then you can do the same thing, driven
by client-side logic.

If you can't directly send queries in any form, then I guess things
are different. But I don't really understand what kind of system you
have in mind.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 8/1/24 07:17, Laurenz Albe wrote:
> On Wed, 2024-07-31 at 14:43 -0400, Joe Conway wrote:
>> I still maintain that there is a whole host of users that would accept 
>> the risk of side channel attacks via existence of an error or not, if 
>> they could only be sure nothing sensitive leaks directly into the logs 
>> or to the clients. We should give them that choice.
> 
> I think that you are right.

thanks

> But what do you tell the users who would not accept that risk?

Document that the option should not be used if that is the case

¯\_(ツ)_/¯

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 8/1/24 07:57, Robert Haas wrote:
> On Wed, Jul 31, 2024 at 4:42 PM Joe Conway <mail@joeconway.com> wrote:
>> You are assuming that everyone allows direct logins with the ability to
>> create procedures. Plenty don't.
> 
> Well, if you can send queries, then you can do the same thing, driven
> by client-side logic.

Sure. Of course you should be monitoring your production servers for 
anomalous workloads, no? "Gee, why is Joe running the same query 
millions of times that keeps throwing errors? Maybe we should go see 
what Joe is up to"

> If you can't directly send queries in any form, then I guess things
> are different.

Right, and there are plenty of those. I have even worked with at least 
one rather large one on behalf of your employer some years ago.

> But I don't really understand what kind of system you have in mind.

Well I did say I was being hand wavy ;-)

It has been a long time since I thought deeply about this. I will try to 
come back with something more concrete if no one beats me to it.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> On 8/1/24 07:17, Laurenz Albe wrote:
>> But what do you tell the users who would not accept that risk?

> Document that the option should not be used if that is the case

Are you proposing that we invent two levels of leakproofness
with different guarantees?  That seems like a mess, not least
because there are going to be varying opinions about where we
should set the bar for the lower level.

            regards, tom lane



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Thu, Aug 1, 2024 at 10:05 AM Joe Conway <mail@joeconway.com> wrote:
> Sure. Of course you should be monitoring your production servers for
> anomalous workloads, no? "Gee, why is Joe running the same query
> millions of times that keeps throwing errors? Maybe we should go see
> what Joe is up to"

I think it's possible that something like this could be part of some
useful approach, but I think it's difficult. If it would take Joe a
month of pounding on the server to steal enough data to matter, then I
think monitoring could be one required part of an information
protection strategy. However, if Joe, or someone with Joe's
credentials, can steal all the secret data in under an hour, the
monitoring system probably doesn't help much. A human being probably
won't react quickly enough to stop something bad from happening,
especially if the person with Joe's credentials begins the attack at
2am on Christmas.

More generally, I think it's difficult for us to build infrastructure
into PostgreSQL that relies on complex assumptions about what the
customer environment is. To some extent, we are already relying on
users to prevent certain types of attacks. For example, RLS supposes
that timing attacks or plan-shape based attacks won't be feasible, but
we don't do anything to prevent them; we just hope the user takes care
of it. That's already a shaky assumption, because timing attacks could
well be feasible across a fairly deep application stack e.g. the user
issues an HTTP query for a web page and can detect variations in the
underlying database query latency.

When you start proposing assumptions that the user can't execute DDL
or can't execute SQL queries or that there's monitoring of the error
log in place, I feel the whole thing gets very hard to reason about.
First, you have to decide on exactly what the assumptions are - no
DDL, no direct SQL at all, something else? Different situations could
exist for different users, so whatever assumption we make will not
apply to everyone. Second, for some of this stuff, there's a sliding
scale. If we stipulate that a user is going to need a monitoring
system, how good does that monitoring system have to be? What does it
have to catch, and how quickly are the humans required to respond? If
we stipulate that the attacker can't execute SQL directly, how much
control over the generated SQL are they allowed to have?

I don't want to make it sound like I think it's hopeless to come up
with something clever here. The current situation kind of sucks, and I
do have hopes that there are better ideas out there. At the same time,
we need to be able to articulate clearly what we are and are not
guaranteeing and under what set of assumptions, and it doesn't seem
easy to me to come up with something satisfying.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Wed, Jul 31, 2024 at 1:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
> However, the risk is that an end-user is going to be much less able to
> evaluate what is and isn't safe than we are. I think some people are
> going to be like -- well the core project doesn't mark enough stuff
> leakproof, so I'll just go add markings to a bunch of stuff myself.
> And they probably won't stop at stuff like UPPER which is almost
> leakproof. They might add it to stuff such as LIKE which results in
> immediately giving away the farm. By not giving people any guidance,
> we invite them to make up their own rules.

+1.

Would it provide enough value for effort to explicitly mark leaky
procedures as such? Maybe that could shrink the grey area enough to be
protective?

--Jacob



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Thu, Aug 1, 2024 at 7:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Are you proposing that we invent two levels of leakproofness
> with different guarantees?  That seems like a mess, not least
> because there are going to be varying opinions about where we
> should set the bar for the lower level.

It kind of reminds me of the kernel's "paranoia level" tunables, which
seem to proliferate in weird ways [1] and not make for a particularly
great UX.

--Jacob

[1] https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Thu, Aug 1, 2024 at 4:45 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> Would it provide enough value for effort to explicitly mark leaky
> procedures as such? Maybe that could shrink the grey area enough to be
> protective?

You mean like proleakproof = true/false/maybe?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Thu, Aug 1, 2024 at 6:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Aug 1, 2024 at 4:45 PM Jacob Champion
> <jacob.champion@enterprisedb.com> wrote:
> > Would it provide enough value for effort to explicitly mark leaky
> > procedures as such? Maybe that could shrink the grey area enough to be
> > protective?
>
> You mean like proleakproof = true/false/maybe?

Yeah, exactly.

--Jacob



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 8/2/24 09:48, Jacob Champion wrote:
> On Thu, Aug 1, 2024 at 6:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Aug 1, 2024 at 4:45 PM Jacob Champion
>> <jacob.champion@enterprisedb.com> wrote:
>> > Would it provide enough value for effort to explicitly mark leaky
>> > procedures as such? Maybe that could shrink the grey area enough to be
>> > protective?
>>
>> You mean like proleakproof = true/false/maybe?
> 
> Yeah, exactly.

<dons flameproof suit>
Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is 
current behavior, 'lax' allows the 'maybe's to get pushed down, and 
'off' ignores the leakproof attribute entirely and pushes down anything 
that merits being pushed?
</dons flameproof suit>

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
"David G. Johnston"
Date:
On Fri, Aug 2, 2024 at 6:58 AM Joe Conway <mail@joeconway.com> wrote:
On 8/2/24 09:48, Jacob Champion wrote:
> On Thu, Aug 1, 2024 at 6:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Aug 1, 2024 at 4:45 PM Jacob Champion
>> <jacob.champion@enterprisedb.com> wrote:
>> > Would it provide enough value for effort to explicitly mark leaky
>> > procedures as such? Maybe that could shrink the grey area enough to be
>> > protective?
>>
>> You mean like proleakproof = true/false/maybe?
>
> Yeah, exactly.

<dons flameproof suit>
Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is
current behavior, 'lax' allows the 'maybe's to get pushed down, and
'off' ignores the leakproof attribute entirely and pushes down anything
that merits being pushed?
</dons flameproof suit>


Another approach would be to do something like:

select "upper"!(column)

to demark within the query usage itself that this function with a maybe leakproof attribute gets interpreted as yes.

or even something like:

select "upper"[leakproof](column)

This is both more readable and provides a way, not that we seem to need it, to place more than one label (or just alternative labels using the same syntax construct, like with explain (...)) inside the "array".

Discoverability of when to add such a marker would be left up to the query author, with the safe default mode being a not leakproof interpretation.

David J.

Re: can we mark upper/lower/textlike functions leakproof?

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> <dons flameproof suit>
> Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is 
> current behavior, 'lax' allows the 'maybe's to get pushed down, and 
> 'off' ignores the leakproof attribute entirely and pushes down anything 
> that merits being pushed?
> </dons flameproof suit>

So in other words, we might as well just remove RLS.

            regards, tom lane



Re: can we mark upper/lower/textlike functions leakproof?

From
Joe Conway
Date:
On 8/2/24 11:07, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
>> <dons flameproof suit>
>> Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is 
>> current behavior, 'lax' allows the 'maybe's to get pushed down, and 
>> 'off' ignores the leakproof attribute entirely and pushes down anything 
>> that merits being pushed?
>> </dons flameproof suit>
> 
> So in other words, we might as well just remove RLS.

Perhaps deciding where to draw the line for 'maybe' is too hard, but I 
don't understand how you can say that in a general sense.

'strict' mode would provide the same guarantees as today. And even 'off' 
has utility for cases where (1) no logins are allowed except for the app 
(which is quite common in production environments) and no database 
errors are propagated to the end client (again pretty standard best 
practice); or (2) non-production environments, e.g. for testing or 
debugging; or (3) use cases that utilize RLS as a notationally 
convenient filtering mechanism and are not bothered by some leakage in 
the worst case.

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Fri, Aug 2, 2024 at 9:13 AM Joe Conway <mail@joeconway.com> wrote:
>
> On 8/2/24 11:07, Tom Lane wrote:
> > Joe Conway <mail@joeconway.com> writes:
> >> <dons flameproof suit>
> >> Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is
> >> current behavior, 'lax' allows the 'maybe's to get pushed down, and
> >> 'off' ignores the leakproof attribute entirely and pushes down anything
> >> that merits being pushed?
> >> </dons flameproof suit>
> >
> > So in other words, we might as well just remove RLS.
>
> Perhaps deciding where to draw the line for 'maybe' is too hard, but I
> don't understand how you can say that in a general sense.

I'm more sympathetic to the "maybe" case, but for the "off" proposal I
find myself agreeing with Tom. If you want "off", turn off RLS.

> And even 'off'
> has utility for cases where (1) no logins are allowed except for the app
> (which is quite common in production environments) and no database
> errors are propagated to the end client (again pretty standard best
> practice);

I'm extremely skeptical of this statement, but I've been out of the
full-stack world for a bit. How does a modern best-practice
application hide the fact that it ran into an error and couldn't
complete a query?

> or (2) non-production environments, e.g. for testing or
> debugging; or

Changing the execution behavior between dev and prod seems like an
anti-goal. When would turning this off help you debug something?

> (3) use cases that utilize RLS as a notationally
> convenient filtering mechanism and are not bothered by some leakage in
> the worst case.

Catering to this use case doesn't seem like it should be a priority.
If a security feature is useful for you in a non-security setting,
awesome, but we shouldn't poke holes in it for that situation, nor
should it be surprising if the security gets stronger and becomes
harder to use for non-security.

--Jacob



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Fri, Aug 2, 2024 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Joe Conway <mail@joeconway.com> writes:
> > <dons flameproof suit>
> > Hmmm, and then have "leakproof_mode" = strict/lax/off where 'strict' is
> > current behavior, 'lax' allows the 'maybe's to get pushed down, and
> > 'off' ignores the leakproof attribute entirely and pushes down anything
> > that merits being pushed?
> > </dons flameproof suit>
>
> So in other words, we might as well just remove RLS.

<stage-whisper>Hey, everybody, I don't think Tom likes the
proposal.</stage-whisper>

I'll be honest: I don't like it, either. I don't even like
proleakproof=true/false/maybe; I asked about that to understand if
that was what Jacob was proposing, not because I actually think we
should do it. The problem is that there's likely to be a fairly wide
range contained inside of "maybe", with cases like "upper" at the
safer end of the spectrum. That's too fuzzy to use as a basis for any
sort of real security, IMHO; we won't be able to find two hackers who
agree on how anything should be marked.

I think part of our problem here is that we have very few examples of
how to actually analyze a function for leakproof-ness, or how to
exploit one that is erroneously so marked. The conversations then tend
to degenerate into some people saying things are scary and some people
saying the scariness is overrated and then the whole thing just
becomes untethered from reality. Maybe we need to create some really
robust documentation in this area so that we can move toward a common
conceptual framework, instead of everybody just having a lot of
opinions.

I can't shake the feeling that if PostgreSQL got the same level of
attention from security researchers that Linux or OpenSSL do, this
would be a very different conversation. The fact that we have more
people complaining about RLS causing poor query performance than we do
about RLS leaking information is probably a sign that it's being used
to provide more security theatre than actual security. Even the leaks
we intended to have are pretty significant, and I'm sure that we have
some we didn't intend.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Fri, Aug 2, 2024 at 9:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I'll be honest: I don't like it, either. I don't even like
> proleakproof=true/false/maybe; I asked about that to understand if
> that was what Jacob was proposing, not because I actually think we
> should do it. The problem is that there's likely to be a fairly wide
> range contained inside of "maybe", with cases like "upper" at the
> safer end of the spectrum. That's too fuzzy to use as a basis for any
> sort of real security, IMHO; we won't be able to find two hackers who
> agree on how anything should be marked.

I guess I wasn't trying to propose that the grey area be used as the
basis for security, but that we establish a lower bound for the grey.
Make things strictly better than today, and cut down on the fear that
someone's going to accidentally mark something that we all agree
shouldn't be. And then shrink the grey area over time as we debate.

(Now, if there aren't that many cases where we can all agree on
"unsafe", then the proposal loses pretty much all value, because we'll
never shrink the uncertainty.)

> I think part of our problem here is that we have very few examples of
> how to actually analyze a function for leakproof-ness, or how to
> exploit one that is erroneously so marked. The conversations then tend
> to degenerate into some people saying things are scary and some people
> saying the scariness is overrated and then the whole thing just
> becomes untethered from reality. Maybe we need to create some really
> robust documentation in this area so that we can move toward a common
> conceptual framework, instead of everybody just having a lot of
> opinions.

+1

--Jacob



Re: can we mark upper/lower/textlike functions leakproof?

From
Robert Haas
Date:
On Fri, Aug 2, 2024 at 12:33 PM Jacob Champion
<jacob.champion@enterprisedb.com> wrote:
> (Now, if there aren't that many cases where we can all agree on
> "unsafe", then the proposal loses pretty much all value, because we'll
> never shrink the uncertainty.)

My belief is that nearly everything in unsafe. We ship with very
little marked leakproof right now, and that might be too conservative,
but probably not by much. For example:

-- +(int4, int4) is not leakproof
robert.haas=# select 2147483647::int4+1;
ERROR:  integer out of range

-- textcat is not leakproof
robert.haas=# create temp table x as select repeat('a',(2^29)::int4-2)::text a;
SELECT 1
robert.haas=# select length(a||a) from x;
ERROR:  invalid memory alloc request size 1073741824

-- absolute value is not leakproof
robert.haas=# select @(-2147483648);
ERROR:  integer out of range

-- textlike is not leakproof
robert.haas=# select 'a' ~ 'b\';
ERROR:  invalid regular expression: invalid escape \ sequence

-- division is not leakproof
robert.haas=# select 2/0;
ERROR:  division by zero

-- to_date is not leakproof
robert.haas=# select to_date('abc', 'def');
ERROR:  invalid value "a" for "d"
DETAIL:  Value must be an integer.

I think it would be safe to mark the bitwise xor operator for integers
as leakproof. But that isn't likely to be used in a query. Most of the
stuff that people actually use in queries isn't even arguably
leakproof.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: can we mark upper/lower/textlike functions leakproof?

From
Jacob Champion
Date:
On Fri, Aug 2, 2024 at 10:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> My belief is that nearly everything in unsafe. We ship with very
> little marked leakproof right now, and that might be too conservative,
> but probably not by much.

Then to me, that seems like the best-case scenario for a "maybe"
classification. I'm still not sold on the idea of automatically
treating all "maybes" as leakproof (personally I'd prefer that users
surgically opt in), but if the pool is small...

Thanks,
--Jacob