Thread: auditing in postgresql
I know already it's possible to audit changes to data in postgresql tables using triggers, etc. A lot of other things can also be logged using the logging mechanism, such as permission errors (by logging all error messages), etc. However, there are also other things that would be useful to audit, such as data being _read_. For instance, if it's normal for a certain user to read data once per month (running a report, etc), and that same user reads the same data at an unexpected time, that may reveal a security problem. I could wrap the table in a SRF that emits a LOG, but that is not very elegant, and the SRF may not perform well because the query could not be optimized the same way. It would also be nice if there was a more unified and complete way of doing this stuff, rather than trying to separate the audit logs from the rest of the logs after the fact. And there is also no way to audit reads, for example, on all objects within a schema or tablespace. And the logging mechanism doesn't have a lot of conditionals, so it's hard to log only statements by privileged users. I'm sure this has been discussed before, so I'd appreciate links to discussions, etc. Regards, Jeff Davis
On Thu, 2007-08-30 at 15:39 -0600, Guy Fraser wrote: > Below is the logging section from the postgresql.conf file. It > would appear that you can configure PostgreSQL to log as much > detail as you want to where you want. You can then write a > program to parse the log file and present the information you > want based on your needs. I do something similar with a different > application which I have configured to use syslog. In syslog > I direct the logging data to a pipe which I read as a stream from > an application I wrote, that processes the realtime activity and > extracts the useful information which I send to an SQL database > for further processing on a batch basis. Capturing everything possible via logging and filtering/processing later was a consideration of mine. It might work, but it's not ideal. I'm a little concerned about it for a few reasons: 1. Performance (although I haven't measured) 2. Trying to figure out which tables are actually being read by grepping the logs is a mess. What if someone makes a rule/view/function over the table (and they have read permissions on the table), and then reads from that? There may even be built-in functions that could accomplish that as long as the user has read access to the table. 3. I'd have to have the schema or tablename unique enough that filtering wouldn't get false positives. Solvable, but not an elegant solution either. My concern is that logging is for logging, not auditing. There's some overlap, but logging doesn't seem to do everything that I need directly. Regards, Jeff Davis
On 8/30/07, Jeff Davis <pgsql@j-davis.com> wrote: > I know already it's possible to audit changes to data in postgresql > tables using triggers, etc. > > A lot of other things can also be logged using the logging mechanism, > such as permission errors (by logging all error messages), etc. > > However, there are also other things that would be useful to audit, > such as data being _read_. For instance, if it's normal for a certain > user to read data once per month (running a report, etc), and that same > user reads the same data at an unexpected time, that may reveal a > security problem. > > I could wrap the table in a SRF that emits a LOG, but that is not very > elegant, and the SRF may not perform well because the query could not > be optimized the same way. It would also be nice if there was a more > unified and complete way of doing this stuff, rather than trying to > separate the audit logs from the rest of the logs after the fact. And > there is also no way to audit reads, for example, on all objects within > a schema or tablespace. And the logging mechanism doesn't have a lot of > conditionals, so it's hard to log only statements by privileged users. Well, a SRF may be unsuitable for various reasons, but maybe views are better. I really like views more and more lately (better than functions as a rule, I think). you have some query, select yadda create view log_yadda as select yadda union all select null, null, null from log_func(); This is a nearly free invocation but not perfect...a limit clause can prevent log_func from executing in some cases for example. I'm looking for a better way to express this. merlin
On Thu, 2007-08-30 at 21:43 -0400, Merlin Moncure wrote: > Well, a SRF may be unsuitable for various reasons, but maybe views are > better. I really like views more and more lately (better than > functions as a rule, I think). > > you have some query, select yadda > create view log_yadda as > select yadda > union all select null, null, null from log_func(); > Interesting idea, I hadn't thought of that. Not perfect, but interesting. The "returns query" might help reduce the penalty of using a SRF. Maybe I'll look into that. Regards, Jeff Davis
On 8/31/07, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2007-08-30 at 21:43 -0400, Merlin Moncure wrote: > > Well, a SRF may be unsuitable for various reasons, but maybe views are > > better. I really like views more and more lately (better than > > functions as a rule, I think). > > > > you have some query, select yadda > > create view log_yadda as > > select yadda > > union all select null, null, null from log_func(); > > > > Interesting idea, I hadn't thought of that. Not perfect, but > interesting. create function func() returns bool as $$ begin raise notice 'log!'; return true; end; $$ language plpgsql; how about this: create view log_yadda sa select yadda where (select func()); the parens around the function call force it to be evaluated as a scalar. :-) merlin
On 8/31/07, Merlin Moncure <mmoncure@gmail.com> wrote: > On 8/31/07, Jeff Davis <pgsql@j-davis.com> wrote: > > On Thu, 2007-08-30 at 21:43 -0400, Merlin Moncure wrote: > > > Well, a SRF may be unsuitable for various reasons, but maybe views are > > > better. I really like views more and more lately (better than > > > functions as a rule, I think). > > > > > > you have some query, select yadda > > > create view log_yadda as > > > select yadda > > > union all select null, null, null from log_func(); > > > > > > > Interesting idea, I hadn't thought of that. Not perfect, but > > interesting. > > create function func() returns bool as > $$ > begin > raise notice 'log!'; > return true; > end; > $$ language plpgsql; > > how about this: > create view log_yadda sa > select yadda where (select func()); > > the parens around the function call force it to be evaluated as a scalar. if you declare func() immutable, you can (maybe) remove the parens because the planner folds the call to a constant. This is faster in some cases because select yadda where true is slightly faster than select yadda where (select true) I'm not completely sure this will prevent multiple executions of func in some cases however. merlin
On Fri, 2007-08-31 at 13:51 -0400, Merlin Moncure wrote: > > how about this: > > create view log_yadda sa > > select yadda where (select func()); > > > > the parens around the function call force it to be evaluated as a scalar. > > if you declare func() immutable, you can (maybe) remove the parens > because the planner folds the call to a constant. This is faster in > some cases because > I like this approach. I'm a little concerned about PREPARE, however. I think that might only execute the function once and then not on subsequent calls, which would then not be audited. Regards, Jeff Davis
On 8/31/07, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, 2007-08-31 at 13:51 -0400, Merlin Moncure wrote: > > > how about this: > > > create view log_yadda sa > > > select yadda where (select func()); > > > > > > the parens around the function call force it to be evaluated as a scalar. > > > > if you declare func() immutable, you can (maybe) remove the parens > > because the planner folds the call to a constant. This is faster in > > some cases because > > > > I like this approach. I'm a little concerned about PREPARE, however. I > think that might only execute the function once and then not on > subsequent calls, which would then not be audited. At present, immutable functions are only treated as constants during a query, which is what we want (no problems with prepare). I am a little bit concered about interactions with queries wrapping the view...you'll just have to try and see. merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > At present, immutable functions are only treated as constants during a > query, which is what we want (no problems with prepare). Uh, no, they'd be folded to constants at plan time, which is exactly what Jeff doesn't want AFAICS. The subselect trick is a bit of a hack, but at present it'll work to guarantee that the function is called only once per plan execution. (That's because we'll treat an "uncorrelated" subquery as an InitPlan even if it contains volatile functions, which strictly speaking we should not; but it's a sufficiently useful behavior that I wouldn't want to get rid of it without providing a more principled substitute ...) regards, tom lane
On 8/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Merlin Moncure" <mmoncure@gmail.com> writes: > > At present, immutable functions are only treated as constants during a > > query, which is what we want (no problems with prepare). > > Uh, no, they'd be folded to constants at plan time, which is exactly > what Jeff doesn't want AFAICS. yikes! I did test this before I posted that, but I oversimplified it: I didn't move the func() to the where clause...do the subselect version defined as volatile seems the way to go. unfortunately this means you pay a small extra price for large result sets. merlin
On Fri, 2007-08-31 at 16:42 -0400, Merlin Moncure wrote: > On 8/31/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > "Merlin Moncure" <mmoncure@gmail.com> writes: > > > At present, immutable functions are only treated as constants during a > > > query, which is what we want (no problems with prepare). > > > > Uh, no, they'd be folded to constants at plan time, which is exactly > > what Jeff doesn't want AFAICS. > > yikes! I did test this before I posted that, but I oversimplified it: > I didn't move the func() to the where clause...do the subselect > version defined as volatile seems the way to go. unfortunately this > means you pay a small extra price for large result sets. > That sounds like a good solution to me. It looks like the planner is able to optimize the queries, and the audit function is only called once. It sounds like I may need to beware of future changes, however. What is the small extra price for large tables though? Thanks for the help! Regards, Jeff Davis