Thread: Row data is reflected in DETAIL message when constraints fail on insert/update

Row data is reflected in DETAIL message when constraints fail on insert/update

From
William Denton
Date:
When inserting up updating a row with not null constraints that are not satisfied, postgres reflects the values in the row in the error DETAIL.

postgres=# create database test;
CREATE DATABASE
postgres=# \c test;
psql (11.1 (Debian 11.1-3.pgdg90+1), server 11.2)
You are now connected to database "test" as user "postgres".
test=# create table person (firstname text not null, lastname text not null, email text not null);
CREATE TABLE
test=# insert into person values ('william', 'denton', null);
ERROR:  null value in column "email" violates not-null constraint
DETAIL:  Failing row contains (william, denton, null).
test=# insert into person values ('william', 'denton', 'email@example.com');
INSERT 0 1
test=# update person set email = null;
ERROR:  null value in column "email" violates not-null constraint
DETAIL:  Failing row contains (william, denton, null).
test=# 

Is there a setting where i can disable the DETAIL field being populated with row data? 

Currently sensitive data (PII in the case illustrated above) is being leaked by the database engine and relayed up into my application where is finds its way into application logs.

I have also opened a github issue with Npgsql to see if its possible to suppress this DETAIL field in exceptions, but it seems this is an issue that all DB drivers/clients will face. https://github.com/npgsql/npgsql/issues/2501

Being able to reflect out the data on a row without doing a select may be a security issue as well.

Thank you!
On 6/19/19 8:56 PM, William Denton wrote:
> When inserting up updating a row with not null constraints that are not 
> satisfied, postgres reflects the values in the row in the error DETAIL.
> 
> postgres=# create database test;
> CREATE DATABASE
> postgres=# \c test;
> psql (11.1 (Debian 11.1-3.pgdg90+1), server 11.2)
> You are now connected to database "test" as user "postgres".
> test=# create table person (firstname text not null, lastname text not 
> null, email text not null);
> CREATE TABLE
> test=# insert into person values ('william', 'denton', null);
> ERROR:  null value in column "email" violates not-null constraint
> DETAIL:  Failing row contains (william, denton, null).
> test=# insert into person values ('william', 'denton', 
> 'email@example.com <mailto:email@example.com>');
> INSERT 0 1
> test=# update person set email = null;
> ERROR:  null value in column "email" violates not-null constraint
> DETAIL:  Failing row contains (william, denton, null).
> test=#
> 
> Is there a setting where i can disable the DETAIL field being populated 
> with row data?

See:

https://www.postgresql.org/docs/11/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT

log_error_verbosity

> 
> Currently sensitive data (PII in the case illustrated above) is being 
> leaked by the database engine and relayed up into my application where 
> is finds its way into application logs.
> 
> I have also opened a github issue with Npgsql to see if its possible to 
> suppress this DETAIL field in exceptions, but it seems this is an issue 
> that all DB drivers/clients will face. 
> https://github.com/npgsql/npgsql/issues/2501
> 
> Being able to reflect out the data on a row without doing a select may 
> be a security issue as well.
> 
> Thank you!


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Shay here, maintainer of the Npgsql driver for .NET.

>> Is there a setting where i can disable the DETAIL field being populated
>> with row data?
>
> See:
>
>
> log_error_verbosity

While this is helpful, this does not seem to quite fit:

1. As this is about personal sensitive data (including conceivably authentication information), the fact that the default is to log seems problematic.
2. The TERSE setting also disables HINT, QUERY and CONTEXT.
3. There may be other information sent in the DETAIL messages which does not contain sensitive data. There's no reason to have that disabled along with the sensitive data.

 In other words, this isn't about verbosity, but about sensitive data. It seems like a specific knob for sensitive information may be required, which would be off by default and would potentially affect other fields as well (if relevant).
On Thu, Jun 20, 2019 at 01:26:23PM +0200, Shay Rojansky wrote:

>  In other words, this isn't about verbosity, but about sensitive data. It
> seems like a specific knob for sensitive information may be required, which
> would be off by default and would potentially affect other fields as well
> (if relevant).

A specifig knob for "sensitive data" cannot be supplied by
PostgreSQL because it cannot know beforehand what information
will be considered sensitive under a given, future, usage
scenario.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



On 6/20/19 4:26 AM, Shay Rojansky wrote:
> Shay here, maintainer of the Npgsql driver for .NET.
> 
>  >> Is there a setting where i can disable the DETAIL field being populated
>  >> with row data?
>  >
>  > See:
>  >
>  > 
> https://www.postgresql.org/docs/11/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHAT
>  >
>  > log_error_verbosity
> 
> While this is helpful, this does not seem to quite fit:
> 
> 1. As this is about personal sensitive data (including conceivably 
> authentication information), the fact that the default is to log seems 
> problematic.
> 2. The TERSE setting also disables HINT, QUERY and CONTEXT.
> 3. There may be other information sent in the DETAIL messages which does 
> not contain sensitive data. There's no reason to have that disabled 
> along with the sensitive data.
> 
>   In other words, this isn't about verbosity, but about sensitive data. 
> It seems like a specific knob for sensitive information may be required, 
> which would be off by default and would potentially affect other fields 
> as well (if relevant).

As Karsten said that is beyond the scope of the Postgres logging. The 
prudent thing would be to prevent the log information reaching the 
application logs. Or put it a log that can only be seen by authorized 
personnel.

-- 
Adrian Klaver
adrian.klaver@aklaver.com



Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> On Thu, Jun 20, 2019 at 01:26:23PM +0200, Shay Rojansky wrote:
>> In other words, this isn't about verbosity, but about sensitive data. It
>> seems like a specific knob for sensitive information may be required, which
>> would be off by default and would potentially affect other fields as well
>> (if relevant).

> A specifig knob for "sensitive data" cannot be supplied by
> PostgreSQL because it cannot know beforehand what information
> will be considered sensitive under a given, future, usage
> scenario.

Yeah, it's fairly hard to see how we could respond to this complaint
without lobotomizing our error messages to the point of near uselessness.
Almost any non-constant text in an error report could possibly be seen
as hazardous.

More generally: I find this complaint a little confusing.  We did not
consider reporting the "show row contents" DETAIL to the client to be a
security hazard when it was added, because one would think that that's
just data that the client already knows anyway.  I'd be interested to see
a plausible use-case in which the message would reflect PII that had not
been supplied by or available to the client.

            regards, tom lane



Karsten,

>>  In other words, this isn't about verbosity, but about sensitive data. It
>> seems like a specific knob for sensitive information may be required, which
>> would be off by default and would potentially affect other fields as well
>> (if relevant).
>
> A specifig knob for "sensitive data" cannot be supplied by
> PostgreSQL because it cannot know beforehand what information
> will be considered sensitive under a given, future, usage
> scenario.

It seems generally agreed that all data from the database should be considered potentially sensitive and should therefore not be leaked in log messages - unless an explicit, informed opt-in is done. It is extremely easy to imagine a (poorly-written) UI or web application which simply surfaces database exceptions, allowing attackers to potentially extract data from the database. In the worst case, passwords and other auth information may get exposed in this way, but even any sort of personal information is a big problem.

It seems worth at least having a conversation about it...
On Thu, Jun 20, 2019 at 05:22:20PM +0200, Shay Rojansky wrote:

> It seems generally agreed that all data from the database should be
> considered potentially sensitive and should therefore not be leaked in log
> messages - unless an explicit, informed opt-in is done. It is extremely
> easy to imagine a (poorly-written) UI or web application which simply
> surfaces database exceptions, allowing attackers to potentially extract
> data from the database. In the worst case, passwords and other auth
> information may get exposed in this way, but even any sort of personal
> information is a big problem.
>
> It seems worth at least having a conversation about it...

Sure, but we are currently exploring whether the database
reflects any values that it had not been given by the same
user beforehand.

There might be another scenario:

    user enters value for column 1

    app adds in secret-to-the-user value for column 2

    UPDATE fails

    error message reflects val 1 and secret val 2

    app displays both values

    user knows secret value 2

but I don't see how PostgreSQL can do anything *reasonable*
about that short of sitting tight-and-mum and not reflect
much of *anything* beyond "error". And even that can be a
side channel.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




>> A specifig knob for "sensitive data" cannot be supplied by
>> PostgreSQL because it cannot know beforehand what information
>> will be considered sensitive under a given, future, usage
>> scenario.

> Yeah, it's fairly hard to see how we could respond to this complaint
> without lobotomizing our error messages to the point of near uselessness.
> Almost any non-constant text in an error report could possibly be seen
> as hazardous.

I don't think that's true - schema information (table, column names) definitely seems like it's in a different category than table contents.

> More generally: I find this complaint a little confusing.  We did not
> consider reporting the "show row contents" DETAIL to the client to be a
> security hazard when it was added, because one would think that that's
> just data that the client already knows anyway.  I'd be interested to see
> a plausible use-case in which the message would reflect PII that had not
> been supplied by or available to the client.

I'm imagining two main scenarios here.

First, there are many lazily-written applications out there which simply show exception/error messages to users. So user A could interact with a website in a way that triggers a unique constraint violation, and thereby get access to the data in the row which caused the violation.

Second, such exceptions and errors frequently get logged (e.g. to disk). Logs in general aren't kept as secure as database data itself (who knows where they end up and who handles them).

In this day and age of increased attention to personal information it seems quite risky to be sending row contents via error messages without an opt-in...





Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> Sure, but we are currently exploring whether the database
> reflects any values that it had not been given by the same
> user beforehand.

> There might be another scenario:
>     user enters value for column 1
>     app adds in secret-to-the-user value for column 2
>     UPDATE fails
>     error message reflects val 1 and secret val 2
>     app displays both values
>     user knows secret value 2

I wondered about this, but a test case did not show it to be possible,
and after some looking around I found the responsible code.  The
"Failing row contains ..." message is built by execMain.c's
ExecBuildSlotValueDescription() function, and it will only print columns
for which (a) the value was supplied in the current INSERT or UPDATE
command, or (b) the calling user has privileges to SELECT that column.
So we have expended at least some effort towards blocking that sort of
hole.

Admittedly, in your example there's a difference between what "the app"
should know and what "the user using the app" should know.  But I'm not
really seeing how Postgres could usefully model that situation.  We have
no idea about the structure of the client-side logic.

BTW, I notice that ExecBuildSlotValueDescription punts and never prints
anything if the table in question has RLS enabled.  So maybe a workable
kluge for the OP is to enable RLS?

            regards, tom lane



Shay Rojansky <roji@roji.org> writes:
>> Yeah, it's fairly hard to see how we could respond to this complaint
>> without lobotomizing our error messages to the point of near uselessness.
>> Almost any non-constant text in an error report could possibly be seen
>> as hazardous.

> I don't think that's true - schema information (table, column names)
> definitely seems like it's in a different category than table contents.

You're not visualizing this with the appropriate amount of paranoia: just
because the database thinks that something is a table or column name
doesn't mean that that's what it actually is.  A screw-up in query syntax
could lead to reporting back something that was really a data value.
Admittedly, that's not very likely given a well-debugged application
issuing the queries, but it's not something to ignore either.

> First, there are many lazily-written applications out there which simply
> show exception/error messages to users. So user A could interact with a
> website in a way that triggers a unique constraint violation, and thereby
> get access to the data in the row which caused the violation.

Well, that's just bad webapp design ...

> Second, such exceptions and errors frequently get logged (e.g. to disk).
> Logs in general aren't kept as secure as database data itself (who knows
> where they end up and who handles them).

Yeah.  We more commonly see this type of complaint in the alternate guise
of "you're logging sensitive information in the postmaster log!  I can't
accept that!".  We've basically established a project policy that the
postmaster log has to be secured to exactly the same extent as the
database files themselves, since otherwise there's no certainty that it
won't leak data you don't want leaked.  On the whole, I think the right
response to this complaint is that equal attention has to be given to
securing error logs on the client side.

            regards, tom lane



> More generally: I find this complaint a little confusing.  We did not
> consider reporting the "show row contents" DETAIL to the client to be a
> security hazard when it was added, because one would think that that's
> just data that the client already knows anyway.  I'd be interested to see
> a plausible use-case in which the message would reflect PII that had not
> been supplied by or available to the client.

I had the same issue in pgaudit which was spilling PHI data in PG logs which we
were feeding to sumologic.  I had to write a python masking program to strip out
literal values from the PG log.



Re: Row data is reflected in DETAIL message when constraints fail on insert/update

From
"David G. Johnston"
Date:
On Thu, Jun 20, 2019 at 9:17 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
> Sure, but we are currently exploring whether the database
> reflects any values that it had not been given by the same
> user beforehand.

> There might be another scenario:
>       user enters value for column 1
>       app adds in secret-to-the-user value for column 2
>       UPDATE fails
>       error message reflects val 1 and secret val 2
>       app displays both values
>       user knows secret value 2

I wondered about this, but a test case did not show it to be possible,
and after some looking around I found the responsible code.  The
"Failing row contains ..." message is built by execMain.c's
ExecBuildSlotValueDescription() function, and it will only print columns
for which (a) the value was supplied in the current INSERT or UPDATE
command, or (b) the calling user has privileges to SELECT that column.
So we have expended at least some effort towards blocking that sort of
hole.


Just to be clear here, the OP provided the following query example:

test=# update person set email = null;
ERROR:  null value in column "email" violates not-null constraint
DETAIL:  Failing row contains (william, denton, null).

The presence of william and denton in the error detail was because the user updating the table has select access on first and last name.  If they did not those fields would not have been part of the error message?  I'm not in a position to experiment right now but what does/should it show in the restrictive case?

David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> Just to be clear here, the OP provided the following query example:

> test=# update person set email = null;
> ERROR:  null value in column "email" violates not-null constraint
> DETAIL:  Failing row contains (william, denton, null).

> The presence of william and denton in the error detail was because the user
> updating the table has select access on first and last name.  If they did
> not those fields would not have been part of the error message?  I'm not in
> a position to experiment right now but what does/should it show in the
> restrictive case?

regression=# create user joe;
CREATE ROLE
test=# create table person(first text, last text, email text not null);
CREATE TABLE
test=# grant select(email),update(email) on person to joe;
GRANT
test=# insert into person values('william','denton','wd40@gmail.com');
INSERT 0 1
test=# \c - joe
You are now connected to database "test" as user "joe".
test=> update person set email = null;
psql: ERROR:  null value in column "email" violates not-null constraint
DETAIL:  Failing row contains (email) = (null).

The DETAIL in this case would be the same whether joe had select(email)
privilege or not; the email value is considered OK to report since it
came from the query not the table.

            regards, tom lane



On Thu, Jun 20, 2019 at 12:16:53PM -0400, Tom Lane wrote:

> Admittedly, in your example there's a difference between what "the app"
> should know and what "the user using the app" should know.  But I'm not
> really seeing how Postgres could usefully model that situation.  We have
> no idea about the structure of the client-side logic.

Absolutely. Perhaps another solution to that problem would be
for OP to tell PG about the desired client-side logic by
wrapping all data access into views and/or functions (cf data
masking).

I guess the original question basically boils down to "Given a
rogue/dumb app, and a DBA who neglected his job, is it PG's
business (or even within its possibilities) to mop up ?"

I'd be inclined to say No.

I would agree it is not entirely trivial to accept
that resolve, though.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



>I guess the original question basically boils down to "Given a
>rogue/dumb app, and a DBA who neglected his job, is it PG's
>business (or even within its possibilities) to mop up ?"

It feels like you aren't setting people up to land in the pit of success. 
It's easy to sit back and call people negligent because they failed to 
change settings from their defaults. Data breaches are all too common 
due to mis-configured systems, we can all have a good laugh at the 
poor people who have suffered breaches due to defaults that 
come/came with s3, mongo and many other data stores, but why must 
we operate on that level to rather than being a little more defensive?

How is it useful in a normally configured database to return row data in
error messages?

Is the client application supposed to parse that data?

Must the client perform another query to discover column names and 
attributes so the data can be parsed?

I can definitely see a use for it during debugging and development where a 
human has their eyes on what the database is returning, but I would argue 
if you wanted that information for debugging purposes you would enable 
verbose logging.

I have spent a few minutes searching google for terms like 
"harden postgres for production" or "locking down postgres" or  
"postgres production configuration". NONE mention log_error_verbosity. 
Searching the postgres wiki yields no results for log_error_verbosity. Only 
once you start searching for the problems caused by log_error_verbosity 
can you become aware that this setting exists and should be changed in 
production environments. Yet the only mention on of this parameter on any
postgres site (docs or wiki) is the one pasted below Calling people negligent
for not knowing something, when you have failed to tell them seems disingenuous.

Further, the documentation for log_error_verbosity mentions nothing about the 
data returned to the client. This text is explicitly talking about the server log.

>Controls the amount of detail written in the server log for each message that is 
>logged. Valid values are TERSE, DEFAULT, and VERBOSE, each adding more 
>fields to displayed messages. TERSE excludes the logging of DETAIL, 
>HINT, QUERY, and CONTEXT error information. VERBOSE output includes the 
>SQLSTATE error code (see also Appendix A) and the source code file name, 
>function name, and line number that generated the error. Only superusers can 
>change this setting.

I would suggest that row data should be reclassified as only appearing in 
VERBOSE configurations as there is nothing an application client could do with 
that information, it is only useful to a human operating interactively with the db.

Cheers,

William

On Sat, 22 Jun 2019 at 20:40, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
On Thu, Jun 20, 2019 at 12:16:53PM -0400, Tom Lane wrote:

> Admittedly, in your example there's a difference between what "the app"
> should know and what "the user using the app" should know.  But I'm not
> really seeing how Postgres could usefully model that situation.  We have
> no idea about the structure of the client-side logic.

Absolutely. Perhaps another solution to that problem would be
for OP to tell PG about the desired client-side logic by
wrapping all data access into views and/or functions (cf data
masking).

I guess the original question basically boils down to "Given a
rogue/dumb app, and a DBA who neglected his job, is it PG's
business (or even within its possibilities) to mop up ?"

I'd be inclined to say No.

I would agree it is not entirely trivial to accept
that resolve, though.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B




On 6/22/19 6:00 AM, William Denton wrote:
>>I guess the original question basically boils down to "Given a
>>rogue/dumb app, and a DBA who neglected his job, is it PG's
>>business (or even within its possibilities) to mop up ?"
> 
> It feels like you aren't setting people up to land in the pit of success. 
> It's easy to sit back and call people negligent because they failed to 
> change settings from their defaults. Data breaches are all too common 
> due to mis-configured systems, we can all have a good laugh at the 
> poor people who have suffered breaches due to defaults that 
> come/came with s3, mongo and many other data stores, but why must 
> we operate on that level to rather than being a little more defensive?
> 
> How is it useful in a normally configured database to return row data in
> error messages?
> 
> Is the client application supposed to parse that data?
> 
> Must the client perform another query to discover column names and 
> attributes so the data can be parsed?

+many

I agree entirely.

> I can definitely see a use for it during debugging and development where a 
> human has their eyes on what the database is returning, but I would argue 
> if you wanted that information for debugging purposes you would enable 
> verbose logging.

Exactly.

Dev and test environments it makes sense to send verbose messages to the
client. In production it does not.

> I have spent a few minutes searching google for terms like 
> "harden postgres for production" or "locking down postgres" or  
> "postgres production configuration". NONE mention log_error_verbosity. 
> Searching the postgres wiki yields no results for log_error_verbosity. Only 
> once you start searching for the problems caused by log_error_verbosity 
> can you become aware that this setting exists and should be changed in 
> production environments. Yet the only mention on of this parameter on any
> postgres site (docs or wiki) is the one pasted below Calling people
> negligent
> for not knowing something, when you have failed to tell them seems
> disingenuous.
> 
> Further, the documentation for log_error_verbosity mentions nothing
> about the 
> data returned to the client. This text is explicitly talking about the
> server log.
> 
>>Controls the amount of detail written in the server log for each
> message that is 
>>logged. Valid values are TERSE, DEFAULT, and VERBOSE, each adding more 
>>fields to displayed messages. TERSE excludes the logging of DETAIL, 
>>HINT, QUERY, and CONTEXT error information. VERBOSE output includes the 
>>SQLSTATE error code (see also Appendix A) and the source code file name, 
>>function name, and line number that generated the error. Only
> superusers can 
>>change this setting.
> 
> I would suggest that row data should be reclassified as only appearing in 
> VERBOSE configurations as there is nothing an application client could
> do with 
> that information, it is only useful to a human operating interactively
> with the db.

In my experience the log files are generally required to be locked down
similarly to postgres itself, as was pointed out by (I think) Tom
elsewhere on this thread. To me, logging these details to the log file
is not the issue. And log files can be controlled in such a way as to
protect them even from the superuser and/or be shipped off to a log
collector for audit purposes.

The issue is the level of detail sent to the client. There is a setting,
client_min_messages, which could be used as a broad hammer to clamp what
gets sent to the client except that it is user settable. I think if we
made that superuser set instead it might at least be a step in the right
direction, although I have not tested to see what the results of an
error look like at the client in that case.

Separately I previously submitted a patch to optionally redact the
messages that go to the client. That would allow the client app to get
the SQL error code at least, but not the message. But it was rejected in
the last commitfest. See the relevant thread here:

https://www.postgresql.org/message-id/flat/2811772.0XtDgEdalL@peanuts2

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Row data is reflected in DETAIL message when constraints fail oninsert/update

From
"Peter J. Holzer"
Date:
On 2019-06-22 22:00:12 +1200, William Denton wrote:
> >I guess the original question basically boils down to "Given a
> >rogue/dumb app, and a DBA who neglected his job, is it PG's
> >business (or even within its possibilities) to mop up ?"
>
> It feels like you aren't setting people up to land in the pit of success. 
> It's easy to sit back and call people negligent because they failed to 
> change settings from their defaults. Data breaches are all too common 
> due to mis-configured systems, we can all have a good laugh at the 
> poor people who have suffered breaches due to defaults that 
> come/came with s3, mongo and many other data stores, but why must 
> we operate on that level to rather than being a little more defensive?
>
> How is it useful in a normally configured database to return row data in
> error messages?

This is extremely useful. It tells you what data didn't match your
program's expectations. Otherwise you just get a vague "unique
constraint violation" and you can then search through a hundred million
rows of data to find that violation.

Useful error messages are one of reasons why I prefer Postgres over
Oracle.

Actually, I would like to get the input data which contributed to the
row which caused that error, but this is of course not trivial (there
are a few fascinating research papers about the topic).

> Is the client application supposed to parse that data?

Nope. The client application should show it to the user or log it
somewhere where an authorized person can find it. "Something didn't
work, please ask your system administrator" is not an adequate error
message if the system administrator has no way to get additional
information.

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment
On Sat, Jun 22, 2019 at 06:40:10PM +0200, Peter J. Holzer wrote:

> > How is it useful in a normally configured database to return row data in
> > error messages?
>
> This is extremely useful. It tells you what data didn't match your
> program's expectations. Otherwise you just get a vague "unique
> constraint violation"

Sure, except some argue that PG not send such information to
the *client* by *default*, which seems to have some merit
(the default should, however, keep logging such data to the
PG log)

This can lead to the following problem:

> and you can then search through a hundred million
> rows of data to find that violation.

which could be solved by passing to the client an identifier
instead of the row data which is also logged to the server
log alongside the row data. The combination of

    %m or %n - timestamp
    %c - session ID
    %l - in-session log line idx
    %e - SQLSTATE

would probably suffice if sent to the client, given it is
logged in the server log.

(not that I suggest any such thing as I certainly lack the
skills to provide a patch)

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



On 6/22/19 10:09 AM, Karsten Hilbert wrote:
> On Sat, Jun 22, 2019 at 06:40:10PM +0200, Peter J. Holzer wrote:
> 
>>> How is it useful in a normally configured database to return row data in
>>> error messages?
>>
>> This is extremely useful. It tells you what data didn't match your
>> program's expectations. Otherwise you just get a vague "unique
>> constraint violation"
> 
> Sure, except some argue that PG not send such information to
> the *client* by *default*, which seems to have some merit
> (the default should, however, keep logging such data to the
> PG log)

Two points:

1) From Tom Lanes post upstream, the client is the one sending the data 
to the server so it/they already know what it is.

2) Defining the client. In most non-trivial cases there is a stack of 
clients. For example in the Django framework I use when connecting to a 
Postgres db there is:
a) psycopg2 --the client that the message is actually going to.
b) The ORM the client that sits above 1) --though it is possible to 
bypass this level.
c) The views, the clients that sit above 1) & 2)

When deploying I ensure the DEBUG setting is set to False to ensure the 
error does not bubble up from 1) to the end user that is looking at the 
output of 3). I do capture the errors and log them(to secure file) for 
use in identifying issues. I also pop up a generic message to the end 
user based off the db error to give them context for why some action did 
not happen e.g. duplicate item(key). Therefore for my use cases the 
detailed information being sent to the low level client(psycopg2) is 
very useful and essential to fixing problems.

What it comes down is that security is situation specific and ever 
changing. Depending on a generic program be it a database or framework 
or something else to anticipate all your requirements is unrealistic and 
ultimately insecure.

> 
> This can lead to the following problem:
> 
>> and you can then search through a hundred million
>> rows of data to find that violation.
> 
> which could be solved by passing to the client an identifier
> instead of the row data which is also logged to the server
> log alongside the row data. The combination of
> 
>     %m or %n - timestamp
>     %c - session ID
>     %l - in-session log line idx
>     %e - SQLSTATE
> 
> would probably suffice if sent to the client, given it is
> logged in the server log.
> 
> (not that I suggest any such thing as I certainly lack the
> skills to provide a patch)
> 
> Karsten
> --
> GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Row data is reflected in DETAIL message when constraints fail oninsert/update

From
"Peter J. Holzer"
Date:
On 2019-06-22 19:09:41 +0200, Karsten Hilbert wrote:
> On Sat, Jun 22, 2019 at 06:40:10PM +0200, Peter J. Holzer wrote:
> > > How is it useful in a normally configured database to return row data in
> > > error messages?
> >
> > This is extremely useful. It tells you what data didn't match your
> > program's expectations. Otherwise you just get a vague "unique
> > constraint violation"
>
> Sure, except some argue that PG not send such information to
> the *client* by *default*, which seems to have some merit
> (the default should, however, keep logging such data to the
> PG log)

Well, William didn't talk about a default. He asked "how is it useful in
a normally configured database", implying that this would never be
useful in production, only for development and debugging.

I strongly disagree with this. When my (non-developer) colleagues down
the hall import their data, I want them to get helpful error messages,
so that they can check their data and possibly fix it before bothering
me. There is no possible data leak - it is their data, they already know
it. But I don't necessarily want to give them access to the postgres
log: That might contain information that they are not supposed to see.

The database cannot know whether data is sensitive or not. The
application can, and it is up to the application to handle sensitive
data appropriately. The database already takes care of not divulging
data that the application couldn't access anyway. So removing this from
the error message doesn't reduce the data that could be reduced
potentially -  but it shifts the burden either to the developer (who
would probably have a much harder time to recreate the information and
probably wont) or to operations, which has to think about giving access
to server log files, etc.

There may be situations where where removing this information from the
logs makes sense - I am not convinced that it even improves security in
the majority of cases.

        hp

--
   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

Attachment

Shay Rojansky <roji@roji.org> writes:

>> I don't think that's true - schema information (table, column names)
>> definitely seems like it's in a different category than table contents.
>
> You're not visualizing this with the appropriate amount of paranoia: just
> because the database thinks that something is a table or column name
> doesn't mean that that's what it actually is.  A screw-up in query syntax
> could lead to reporting back something that was really a data value.
> Admittedly, that's not very likely given a well-debugged application
> issuing the queries, but it's not something to ignore either.

Good point, thanks. I guess the main thing I'm trying to say is that in the standard, "default" scenario we should consider doing our best to avoid data leaks, rather than approaching this as a 100% all-or-nothing problem..

>> First, there are many lazily-written applications out there which simply
>> show exception/error messages to users. So user A could interact with a
>> website in a way that triggers a unique constraint violation, and thereby
>> get access to the data in the row which caused the violation.
>
> Well, that's just bad webapp design ...

True, but as this is a potentially critical security/privacy concern it seems reasonable to do what we can to prevent these leaks, even if ultimately they are the app developer's fault/responsibility.

Basically at the moment there's a "pit of failure" where an inexperienced dev doing a standard application has a high chance of encountering this issue, and unfortunately most of the Internet is made up of this scenario. I'm only proposing to protect against the obvious/easy data leak scenarios via an opt-in.