Re: pgaudit - an auditing extension for PostgreSQL - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: pgaudit - an auditing extension for PostgreSQL
Date
Msg-id 20150119032837.GC3062@tamriel.snowman.net
Whole thread Raw
In response to Re: pgaudit - an auditing extension for PostgreSQL  (Abhijit Menon-Sen <ams@2ndQuadrant.com>)
Responses Re: pgaudit - an auditing extension for PostgreSQL
List pgsql-hackers
Abhijit,

Apologies, I've been meaning to go through this for quite some time.

* Abhijit Menon-Sen (ams@2ndQuadrant.com) wrote:
> I've changed pgaudit to work as you suggested.

Great, glad to see that.

> A quick note on the implementation: pgaudit was already installing an
> ExecutorCheckPerms_hook anyway; I adapted code from ExecRTECheckPerms
> to check if the audit role has been granted any of the permissions
> required for the operation.

Sure, makes sense to me.

> This means there are three ways to configure auditing:
>
> 1. GRANT … ON … TO audit, which logs any operations that correspond to
>    the granted permissions.

I was thinking we would be able to configure which role this applies to,
rather than hard-coding 'audit' as *the* role.  Perhaps with a new GUC,
or the existing 'pgaudit.roles' GUC could be used.  Further, this can be
extrapolated out to cover auditing of roles by checking for role
membership in this role, see below.

> 2. Set pgaudit.roles = 'r1, r2, …', which logs everything done by r1,
>    r2, and any of their descendants.

If these roles were the 'audit' roles as considered under #1 then
couldn't administrators control what pgaudit.roles provide today using
role memberships granted to the roles listed?

> 3. Set pgaudit.log = 'read, write, …', which logs any events in any of
>    the listed classes.

Approach #1 addresses this too, doesn't it?  SELECT vs. UPDATE vs.
DELETE, etc?  I'm not sure that this really adds much as it isn't nearly
as granular as the GRANT-based approach since it applies to everything.

One of the really big and interesting distinctions to consider between
checking permissions granted to an 'audit' role vs. role membership is
how DDL is handled.  As currently implemented, role-level auditing is
required to have DDL be logged, but you may very well want to log all
DDL and no DML, for example.  What would be really helpful is to develop
a wiki to cover common use-cases and how to set up pgaudit for them.

> (This is a small change from the earlier behaviour where, if a role was
> listed in .roles, it was still subject to the .log setting. I find that
> more useful in practice, but since we're discussing Stephen's proposal,
> I implemented what he said.)

Right- the idea is to provide a very granular way of specifying what is
to be logged; much more than what the previous pair of GUCs provided.

> The new pgaudit.c is attached here for review. Nothing else has changed
> from the earlier submission; and everything is in the github repository
> (github.com/2ndQuadrant/pgaudit).

There's a few big questions we need to address with pgaudit to have it
go into our repo.

The first is what major version(s) we're targetting.  My feeling is that
we should strip down what goes into contrib to be 9.5 based.  I
certainly understand that you want to support earlier versions but I
don't think it makes sense to try and carry that baggage in contrib when
it won't ever be released with earlier versions.

The second is the USE_DEPARSE_FUNCTIONS-related bits.  I realize that
there's a goal to get additional things into 9.5 from that branch but it
doesn't make sense for the contrib pgaudit to include those components
prior to them actually being in core.  I'm also wondering if pieces of
that are now out-of-date with where core is.  If those parts are going
to go into 9.5 soon (which looks like it may happen..) then hopefully we
can just remove the #define's and clean up the code to use the new
capabilities.

Lastly, I'll echo a concern which Robert has raised which is- how do we
make sure that new commands, etc, are covered?  I don't particularly
like how pgaudit will need to be kept in sync with what's in
ProcessUtility (normal and slow).  I'm actually a bit hopeful that we
can work out a way for pgaudit to help with this through a regression
test which loops over all objects and tests logging each of them.

One of the important aspects of auditing is that what is requested to be
audited is and exactly that- no more, no less.  Reviewing the code makes
me pretty nervous about that actually happening properly, but that's
mostly due to the back-and-forth between different major versions and
the #ifdef/#ifndef's.

Additional comments in-line below.

> enum LogClass {
>     LOG_NONE = 0,
>
>     /* SELECT */
>     LOG_READ = (1 << 0),
>
>     /* INSERT, UPDATE, DELETE, TRUNCATE */
>     LOG_WRITE = (1 << 1),
>
>     /* GRANT, REVOKE, ALTER … */
>     LOG_PRIVILEGE = (1 << 2),
>
>     /* CREATE/DROP/ALTER ROLE */
>     LOG_USER = (1 << 3),
>
>     /* DDL: CREATE/DROP/ALTER */
>     LOG_DEFINITION = (1 << 4),
>
>     /* DDL: CREATE OPERATOR etc. */
>     LOG_CONFIG = (1 << 5),
>
>     /* VACUUM, REINDEX, ANALYZE */
>     LOG_ADMIN = (1 << 6),
>
>     /* Function execution */
>     LOG_FUNCTION = (1 << 7),
>
>     /* Absolutely everything; not available via pgaudit.log */
>     LOG_ALL = ~(uint64)0
> };

I'd really like to see this additional information regarding what kind
a command is codified in core.  Ideally, we'd have a single place which
tracks the kind of command and possibly other information which can then
be addressed all at once when new commands are added.

Also, we could allow more granularity by using the actual classes
which we already have for objects.

> /*
>  * This module collects AuditEvents from various sources (event
>  * triggers, and executor/utility hooks) and passes them to the
>  * log_audit_event() function.
>  *
>  * An AuditEvent represents an operation that potentially affects a
>  * single object. If an underlying command affects multiple objects,
>  * multiple AuditEvents must be created to represent it.
>  */

The above doesn't appear to be accurate (perhaps it once was?) as
log_audit_event() only takes a single AuditEvent structure at a time and
it's not a list.  I'm not sure that needs to change, just pointing out
that the comment appears to be inaccurate.

> typedef struct {
>     NodeTag type;
>     const char *object_id;
>     const char *object_type;
>     const char *command_tag;
>     const char *command_text;
>     bool granted;
> } AuditEvent;

I'm inclined to suggest that the decision be made earlier on about if a
given action should be logged, as the earlier we do that the less work
we have to do and we could avoid having to pass in things like 'granted'
to the log_audit_event function at all.  Having the decision split with
the 'granted' flag being set from the callback and then the
'should_be_logged' happening inside log_audit_event doesn't strike me as
terribly sensible.

> /*
>  * Returns the oid of the hardcoded "audit" role.
>  */
>
> static Oid
> audit_role_oid()
> {
>     HeapTuple roleTup;
>     Oid oid = InvalidOid;
>
>     roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum("audit"));
>     if (HeapTupleIsValid(roleTup)) {
>         oid = HeapTupleGetOid(roleTup);
>         ReleaseSysCache(roleTup);
>     }
>
>     return oid;
> }

The above could go away if we follow my suggestions above, of course.

> /* Returns true if either pgaudit.roles or pgaudit.log is set. */
>
> static inline bool
> pgaudit_configured()
> {
>     return (pgaudit_roles_str && *pgaudit_roles_str) || pgaudit_log != 0;
> }

This feels, to me at least, like it shouldn't be necessary.  If the
check to see if we should be logging the current action is moved up
sufficiently, then we may be able to do away with this function and the
checks associated with it- those would turn into "should we be logging
this?" and, if the lists are empty, would return "no" pretty quickly.

> /*
>  * Takes a role OID and returns true if the role is mentioned in
>  * pgaudit.roles or if it inherits from a role mentioned therein;
>  * returns false otherwise.
>  */
>
> static bool
> role_is_audited(Oid roleid)
> {
>     List *roles;
>     ListCell *lt;
>
>     if (!pgaudit_roles_str || !*pgaudit_roles_str)
>         return false;
>
>     if (!SplitIdentifierString(pgaudit_roles_str, ',', &roles))
>         return false;
>
>     foreach(lt, roles)
>     {
>         char *name = (char *)lfirst(lt);
>         HeapTuple roleTup;
>
>         roleTup = SearchSysCache1(AUTHNAME, PointerGetDatum(name));
>         if (HeapTupleIsValid(roleTup))
>         {
>             Oid parentrole = HeapTupleGetOid(roleTup);
>
>             ReleaseSysCache(roleTup);
>             if (is_member_of_role(roleid, parentrole))
>                 return true;
>         }
>     }
>
>     return false;
> }

The results here should be cached, as was discussed earlier, iirc.

> /*
>  * Takes a role OID and an AuditEvent and returns true or false
>  * depending on whether the event should be logged according to the
>  * pgaudit.roles/log settings. If it returns true, it also fills in the
>  * name of the LogClass which it is to be logged under.
>  */
>
> static bool
> should_be_logged(Oid userid, AuditEvent *e, const char **classname)
> {
>     enum LogClass class = LOG_NONE;
>     char *name;
>
>     /*
>      * Look at the type of the command and decide what LogClass needs to
>      * be enabled for the command to be logged.
>      */
>
>     switch (e->type)
>     {
>         case T_SelectStmt:
>             name = "READ";
>             class = LOG_READ;
>             break;

[... long list of every type we have ...]

I'm really not a fan of these big switch statements which attempt to
cover everything which currently exists.  If we don't come up with a way
to make very sure that these areas are updated as things change, we're
going to end up missing things.  As mentioned up-thread, I'd really like
to see a way to identify these types from core-based information
instead.

>             /*
>              * Anything that's left out of the list above is just noise,
>              * and not very interesting from an auditing perspective. So
>              * there's intentionally no way to enable LOG_ALL.
>              */

This approach really worries me- what about some new type which is
added?  How can we know that it's just "noise"?  Whatever is considered
'noise' should at least be explicitly listed, so we know we're covering
everything.

> /*
>  * Takes an AuditEvent and, if it should_be_logged(), writes it to the
>  * audit log. The AuditEvent is assumed to be completely filled in by
>  * the caller (unknown values must be set to "" so that they can be
>  * logged without error checking).
>  */
>
> static void
> log_audit_event(AuditEvent *e)
> {

So, clearly, this is an area which could use some improvement.
Specifically, being able to write to an independent file instead of just
using ereport(), supporting other output formats (JSON, maybe even a
table..).  Also, have you considered using a dynamic shared memory block
to queue the logging messages to and then a background worker to write
them out in-order to a file?  It'd clearly be similar to our existing
ereport() mechanism, but importantly you'd be able to send it to another
file.  Being able to log directly to syslog or other message queues
would be great also.

These are things which could be added later, of course.

> /*
>  * Create AuditEvents for DML operations via executor permissions
>  * checks. We create an AuditEvent for each table in the list of
>  * RangeTableEntries from the query.
>  */
>
> static void
> log_executor_check_perms(Oid auditOid, List *rangeTabls, bool abort_on_violation)
> {
>     ListCell *lr;
>
>     foreach(lr, rangeTabls)
>     {
>         Oid relOid;
>         Relation rel;
>         AuditEvent e;
>         RangeTblEntry *rte = lfirst(lr);
>         char *relname;
>         const char *tag;
>         const char *reltype;
>         NodeTag type;
>
>         /* We only care about tables, and can ignore subqueries etc. */
>         if (rte->rtekind != RTE_RELATION)
>             continue;
>
>         /*
>          * Get the fully-qualified name of the relation.
>          *
>          * User queries against catalog tables (e.g. "\dt") are logged
>          * here. Should we filter them out, as we do for functions in
>          * pg_catalog?
>          */

I'd probably say 'yes', regarding this, though it might be nice to have
it be optional..

>         relOid = rte->relid;
>         rel = relation_open(relOid, NoLock);
>         relname = quote_qualified_identifier(get_namespace_name(RelationGetNamespace(rel)),
>                                              RelationGetRelationName(rel));
>         relation_close(rel, NoLock);
>
>         /*
>          * We don't have access to the parsetree here, so we have to
>          * generate the node type, object type, and command tag by
>          * decoding rte->requiredPerms and rte->relkind.
>          */

This seems like it could be improved- have you considered suggesting a
hook which is, perhaps, in a better place and could pass in this
information instead of having to attempt to regenerate it?

>         /*
>          * If a role named "audit" exists, we check if it has been
>          * granted permission to perform the operation identified above.
>          * If so, we must log the event regardless of the static pgaudit
>          * settings.
>          */

This isn't quite what was intended.  This isn't about what the audit
role could do but rather auditing specifically what the role has access
to.  The idea is that the permission system is used as a proxy for
configuration at the action+column level.

As an example, if an UPDATE with a WHERE clause is executed and the
audit user has SELECT rights on the table, then the UPDATE should be
logged due to the WHERE clause.  At least, that's been my thinking as it
mirrors our permissions system but, as Robert said, it's an
'anti-permission' approach.  On the flip side, if UPDATE is granted to
the audit role, but an UPDATE with WHERE clause is run then we should
log that- even if the audit role doesn't have SELECT.  This all matches
up (but is opposite of) our regular permission system.  Deviating from
that strikes me as a bad idea.

One caveat on this is that I was thinking the permission would have to
be explicitly granted- that is, grants to PUBLIC would not count.  That
may be more difficult to deal with though, but it'd certainly make
auditing of functions easier as execute is granted to public by default
there and would create a lot of noise otherwise.

> log_utility_command(Node *parsetree,
>                     const char *queryString,
>                     ProcessUtilityContext context,
>                     ParamListInfo params,
>                     DestReceiver *dest,
>                     char *completionTag)

Similar thoughts here wrt the huge switch statement, etc.

> /*
>  * Create AuditEvents for certain kinds of CREATE and ALTER statements,
>  * as detected by log_object_access() in lieu of event trigger support
>  * for them.
>  */

Just to comment on this, the object access framework feels like it
should, perhaps, be reworked to be based on event triggers..  I'm
nervous that we're ending up with so many different ways to deal with
what's happening under ProcessUtility, and that we've now split that in
half between what can be done with event triggers and what can't, with
no way for modules to easily know which is which.

> log_function_execution(Oid objectId)
> {
>     HeapTuple proctup;
>     Form_pg_proc proc;
>     const char *name;
>     AuditEvent e;
>
>     proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(objectId));
>     if (!proctup)
>         elog(ERROR, "cache lookup failed for function %u", objectId);
>     proc = (Form_pg_proc) GETSTRUCT(proctup);
>
>     /*
>      * Logging execution of all pg_catalog functions would
>      * make the log unusably noisy.
>      */
>
>     if (IsSystemNamespace(proc->pronamespace))
>     {
>         ReleaseSysCache(proctup);
>         return;
>     }
>
>     name = quote_qualified_identifier(get_namespace_name(proc->pronamespace),
>                                       NameStr(proc->proname));
>     ReleaseSysCache(proctup);
>
>     e.type = T_ExecuteStmt;
>     e.object_id = name;
>     e.object_type = "FUNCTION";
>     e.command_tag = "EXECUTE";
>     if (debug_query_string)
>         e.command_text = debug_query_string;
>     else
>         e.command_text = "";
>     e.granted = false;
>
>     log_audit_event(&e);
> }

This doesn't address overloaded functions, does it?  It probably
should include the arguments and types in 'name'.
Thanks!
    Stephen

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Next
From: Ashutosh Bapat
Date:
Subject: Re: Partitioning: issues/ideas (Was: Re: On partitioning)