Auditing via logical decoding - Mailing list pgsql-hackers

From Philip Scott
Subject Auditing via logical decoding
Date
Msg-id 40d17f66c2d979f27c5431a04f246434@safetyphil.com
Whole thread Raw
Responses Re: Auditing via logical decoding  (Jeremy Finzel <finzelj@gmail.com>)
Re: Auditing via logical decoding  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm 
successfully for some years, but the performance penalty is starting to 
grate a bit and so I have been tasked with seeing if we can make use of 
the new logical decoding functions to achieve the same thing. I thought 
that someone must already have written something that would satisfy our 
use-case but my internet searches have come up short so far so I am 
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is 
crazy; I’ve browsed around the postgres source code a bit before but 
I’ve never really gotten my hands dirty and am a little bit nervous 
about putting my own C code into the heart of our DBMS so if this comes 
to anything I would like to offer my code up for review and/or possible 
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
   - One row for every transaction that changes the state of the 
database.
     We call this table ‘audit_entry’ and contains the xid, transaction 
timestamp, username, client hostname, and application name of the 
session that caused the change.
   - One row for each change made by each transaction which records the 
state of the tuple before the change.
     We call this table ‘audit_detail’ and contains xid, statement 
timestamp, table name & schema, event_type, primary_key (hstore), 
old_row (hstore), and the text of the query that was responsible for the 
change.

A lot of that information is available already by listening to the 
pgoutput decoding, and my first thought was that I could just write a 
receiver for that. However, application name, username, client hostname 
and current_query() are not available. This is understandable as they 
aren’t useful for logical replication.

I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
   - Write this extra data into a logical message while the transaction 
is still in progess
     Either with a deferred trigger per table or, perhaps better
    Find some global commit-time (or xid-assigment time) hook emit it there

   - Then get the information out of the database:
     Either modify the existing pgoutput plugin & protocol to forward 
such messages in its stream,
     Or write a dedicated ‘audit’ decoding plugin with its own protocol

   - Then get the information into the ‘auditing’ database:
     Either with some standalone process that connects to both, consumes 
the output created above, translates it to SQL to run in the auditing 
DB.
     Figure out how to create a proper postgres background process to do 
it, in a similar fashion to the logical replication worker

Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this 
for less than my company will have to pay me to do it, please drop me a 
line ☺



pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Next
From: Jeremy Finzel
Date:
Subject: Re: Auditing via logical decoding