Re: [Solved] Generic logging system for pre-hstore using plperl triggers - Mailing list pgsql-general

From Merlin Moncure
Subject Re: [Solved] Generic logging system for pre-hstore using plperl triggers
Date
Msg-id CAHyXU0wKAmPkRTc5O_+XGWEah-iWQmsuqKfGv=c4OKiJS8aSkw@mail.gmail.com
Whole thread Raw
In response to Re: [Solved] Generic logging system for pre-hstore using plperl triggers  (Diego Augusto Molina <diegoaugustomolina@gmail.com>)
Responses Re: [Solved] Generic logging system for pre-hstore using plperl triggers  (Diego Augusto Molina <diegoaugustomolina@gmail.com>)
List pgsql-general
On Wed, Sep 28, 2011 at 8:20 AM, Diego Augusto Molina
<diegoaugustomolina@gmail.com> wrote:
> 2011/9/27, Merlin Moncure <mmoncure@gmail.com>:
>> *) when posting schema definitions, particularly in email format, try
>> not to use dumped definitions from pg_dump or pgadmin. This creates a
>> lot of noise in the script that detracts from what you are trying to
>> do.   Also an attached file would probably have been more appropriate.
>
> Right! I'm sending it attached and from the source files instead of
> the pgAdminIII dump. Well, that'll be the next mail, I don't have the
> files right now.
>
>> *) using smallint key for client_inet is really dubious.  why not just
>> use the inet itself?
>
> Sure, this has two reasons:
>
> 1) I wanted the audit table to be as narrow as possible. Believe me, I
> migrated the tenths of millions of tuples from the previous (slow,
> inefficient, extremly difficult to maintain) logging system on a test
> server just to try things out: the narrower the table, the better it
> performs with searchs. And I mean it! You wouldn't imagine. I don't
> know what I did with the annotated results, but I will search for them
> to share that.
>
> 2) I put many things outside the audit table (like the table, schema
> and field names too); that makes it narrower but also makes it very
> easy to see all values without querying the audit table: I can see in
> a very small table all inet's from clients (and you could gather more
> info if you would like too). Note that for us most of the accesses to
> the database come from the web server which implements a new version
> of a big application, so it would be a pitty to allocate 5 extra bytes
> (7 bytes for IPv4, 2 for smallint) just to see mostly the same IP
> address. So, why bother logging the IP at all? well, besides adding
> completeness, it allows us to see if they were using the new
> application or the old one, which accessed directly to the database
> server from the client's computer.
>
> Other fields, namely client_port and pid, aren't mapped out to other
> tables because they do not increase too much the table width and
> because I wouldn't analyze those contents very often: like never, but
> may happen some time some kind of weird attack that needs to be
> analized with this data.
>
>> *) what is the audet table for?  Are you truly storing a record for
>> every field of every audited table?  This will be incredibly
>> efficient, especially for large, wide tables.
>
> See the answer about criterions.
>
>> *) surely, creating a table called 'table' is not a good idea.
>
> Ok, name it "tables" then.
>
>> *) this approach obviously is a lot more complicated than hstore.
>> however, for 8.4 and down, hstore won't work. but, what about just
>> storing the record as text?
>
> Third time lucky! see the next...
>
>> *) I can't quite follow the perl criteron steps -- what is happening
>> there?  What are the loops doing?
>
> We have some very wide tables (like 20 or more columns). We only log
> the value of two kind of columns: i) those which make up the primary
> key of the table (which helps to track down the alteration); and ii)
> those whose values change in the event. Note that columns in group i)
> can also be in group ii)
> This carries the complexity of the criterions, which are meant to
> determine the primary key of the table at any cost. Each failing
> criterion makes the following one to take place. These are the
> criterions I could think of:
> 1) If we got parameters, _check_ them and consider each of them as one
> column of the primary key. This is the cheapest and almost way
> through. We really have to check, because if there's i.e. some typo
> the whole transaction outside the trigger would fail inconditionally
> together and we want this logging system to interfere as least as
> possible. A little less performance in exchange for some more
> stability.
> 2) Search in the system catalogs for a primary key constraint.
> 3) Search in the system catalogs for the unique constraint which has
> least columns (in fact, I think it should be "the narrowest unique
> constraint").
> 4) If the table has OIDs, use that and emit a warning (that's never a
> real pk, unless you make an external unique index, which I don't have
> intentions to check right now).
> 5) The "else" (or "default") case is to log every column emiting a
> warning (Really guys, use primary keys! ;).
> We wouldn't bear with these complexity every time but only once: if
> criterion 1) fails, after determining the "primary key" we should
> execute a string which drops this same trigger and re-creates it
> passing it the names of the columns which were determined to be the pk
> so that the next time we don't go furher than 1). This works, I tried
> it out time ago but never did the change (oops!). I mean, dropping the
> trigger from the trigger itself (in the docs it says that 'alter
> trigger' can only rename it).
>
> For my case, all this head ache was needed: it was specified as one of
> the requirements of the logging system that every tuple should be
> trackable. Using a recursive "with" query, it is possible to track
> down the changes to any single tuple in the audited tables (or make a
> function, whatever). Fortunately, they never specified a maximum time
> for that ;). If instead we would have made a string from the record,
> we wouldn't have been able to easily track the tupple.
>
> Note also the "rotate(character)" function. It hot-rotates the
> audit/audet tables similarly to a logrotate program in *nix. At the
> same time, you never stop logging, and you can dump the old table
> before dropping it (to save space) and restore it somewhere else to
> exploit it. We would usually rotate each 6 months or so, but that is
> shortening each time with the growing system.
>
> The *real* system is very discreet with I/U/D operations: it's not
> usual to have more than one of those operations per second. For higher
> transactional systems I think this logging system would be more
> noticeable (right now things go smooth). As I said before, if you have
> test cases they're very welcome.


Yup -- I get it now.  Well, one point about this is that it seems
directed towards your personal requirements. This is a very 'heavy'
audit system that will not be suitable for high performance
transactional systems.  That said, it looks well thought out.  Storing
only the changed fields is pretty clever, but I can't help but wonder
if you're better off using arrays for that purpose:

create type audit_field_t as (field text, old_value text, new_value text);

and inside the audit table itself having
  fields audit_field_t,

and, if at all possible, constructing the array of audit fields in a
single expression.  This will be much more compact than one record per
field -- normally, arrays in table definitions tend to be bad mojo but
this is one case they could be useful.  Audit records are WORM, 'Write
Once Read Maybe', so compactness is important.   Obviously, for 9.0+,
I would be rigging a solution around hstore for an 'all sql' solution
which is usually better if you can get away with it.

merlin

pgsql-general by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)
Next
From: Chris Travers
Date:
Subject: Re: tubles matching