Thread: [Solved] Generic logging system for pre-hstore using plperl triggers
[Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
Hi, I had to implement a logging system for some DBs in work. It's generic and implemented using plperl. I've seen tons of mails on the list from newbies asking for something like this using plpgsql, but no specific solution is pointed for them. I think this may discourage some of them. The system is implemented using a separate schema with whatever name you want, and has some really nice features: relevant tables can be rotated to facilitate auditing, each logged action refers to the modified tuple by pk, which you don't even have to (but can) specify when reating the trigger, and some more. Performance is very acceptable (test cases and suggestions are welcome), and never had a problem since the about 8 months it's been working. In the wiki some points are mentioned, but all is _too_ general and for a vanilla pg-8.4 you won't have the hstore facility (note that upgrading is not always a choice). Will it worth pasting the little code in here or that'll be ignored? -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Filip Rembiałkowski
Date:
2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>
If you can share it - do it, maybe someone will have use of your work - it's always welcome.
Publishing it on the web and linking here instead of pasting - will be even better.
Filip
Hi, I had to implement a logging system for some DBs in work. It's
generic and implemented using plperl. I've seen tons of mails on the
list from newbies asking for something like this using plpgsql, but no
specific solution is pointed for them. I think this may discourage
some of them.
The system is implemented using a separate schema with whatever name
you want, and has some really nice features: relevant tables can be
rotated to facilitate auditing, each logged action refers to the
modified tuple by pk, which you don't even have to (but can) specify
when reating the trigger, and some more. Performance is very
acceptable (test cases and suggestions are welcome), and never had a
problem since the about 8 months it's been working.
In the wiki some points are mentioned, but all is _too_ general and
for a vanilla pg-8.4 you won't have the hstore facility (note that
upgrading is not always a choice).
Will it worth pasting the little code in here or that'll be ignored?
If you can share it - do it, maybe someone will have use of your work - it's always welcome.
Publishing it on the web and linking here instead of pasting - will be even better.
Filip
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
/* Created by Diego Augusto Molina in 2011 for Tucuman Government, Argentina. */ /* -- Execute the following accordingly to your needs. CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql'; CREATE TRUSTED PROCEDURAL LANGUAGE 'plperl'; */ CREATE ROLE auditor NOSUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE; CREATE ROLE audit LOGIN ENCRYPTED PASSWORD 'test.1234' NOSUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE; CREATE SCHEMA audit AUTHORIZATION audit; ALTER ROLE auditor SET search_path=audit; ALTER ROLE audit SET search_path=audit; SET search_path=audit; SET SESSION AUTHORIZATION audit; CREATE SEQUENCE seq_audit INCREMENT 1 MINVALUE -9223372036854775808 MAXVALUE 9223372036854775807 START 0 CACHE 1 CYCLE; ALTER TABLE seq_audit OWNER TO audit; CREATE SEQUENCE seq_elems INCREMENT 1 MINVALUE -32768 MAXVALUE 32767 START 0 CACHE 1 CYCLE; ALTER TABLE seq_elems OWNER TO audit; CREATE TABLE field ( id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass), value name NOT NULL, CONSTRAINT field_pk PRIMARY KEY (id) WITH (FILLFACTOR=100), CONSTRAINT field_uq_value UNIQUE (value) WITH (FILLFACTOR=100) ) WITH ( OIDS=FALSE ); ALTER TABLE field OWNER TO audit; GRANT ALL ON TABLE field TO audit; GRANT SELECT ON TABLE field TO auditor; CREATE TABLE client_inet ( id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass), value inet NOT NULL DEFAULT inet_client_addr(), CONSTRAINT dir_inet_pk PRIMARY KEY (id ) WITH (FILLFACTOR=100), CONSTRAINT dir_inet_uq_value UNIQUE (value) WITH (FILLFACTOR=95) ) WITH ( OIDS=FALSE ); ALTER TABLE client_inet OWNER TO audit; GRANT ALL ON TABLE client_inet TO audit; GRANT SELECT ON TABLE client_inet TO auditor; CREATE TABLE schema ( id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass), value name NOT NULL, CONSTRAINT schema_pk PRIMARY KEY (id ) WITH (FILLFACTOR=100), CONSTRAINT schema_uq_value UNIQUE (value ) WITH (FILLFACTOR=100) ) WITH ( OIDS=FALSE ); ALTER TABLE schema OWNER TO audit; GRANT ALL ON TABLE schema TO audit; GRANT SELECT ON TABLE schema TO auditor; CREATE TABLE table ( id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass), value name NOT NULL, CONSTRAINT table_pk PRIMARY KEY (id ) WITH (FILLFACTOR=100), CONSTRAINT table_uq_value UNIQUE (value ) WITH (FILLFACTOR=100) ) WITH ( OIDS=FALSE ); ALTER TABLE table OWNER TO audit; GRANT ALL ON TABLE table TO audit; GRANT SELECT ON TABLE table TO auditor; CREATE TABLE user ( id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass), value name NOT NULL DEFAULT "current_user"(), CONSTRAINT user_pk PRIMARY KEY (id ) WITH (FILLFACTOR=100), CONSTRAINT user_uq_value UNIQUE (value ) WITH (FILLFACTOR=95) ) WITH ( OIDS=FALSE ); ALTER TABLE user OWNER TO audit; GRANT ALL ON TABLE user TO audit; GRANT SELECT ON TABLE user TO auditor; CREATE TABLE audit ( id bigint, type character(1), tstmp timestamp with time zone DEFAULT now(), schema smallint, table smallint, user smallint, client_inet smallint, client_port integer DEFAULT inet_client_port(), pid integer DEFAULT pg_backend_pid() ) WITH ( OIDS=FALSE ); ALTER TABLE audit OWNER TO audit; GRANT ALL ON TABLE audit TO audit; GRANT SELECT ON TABLE audit TO auditor; CREATE TABLE audet ( id bigint, field smallint, is_pk boolean, before text, after text ) WITH ( OIDS=FALSE ); ALTER TABLE audet OWNER TO audit; GRANT ALL ON TABLE audet TO audit; GRANT SELECT ON TABLE audet TO auditor; CREATE OR REPLACE FUNCTION tgf_ins_audet() RETURNS trigger AS $BODY$ begin execute E'insert into audet_' || tg_argv[0] || E' ( id, field, is_pk, before, after ) values ( '||coalesce(new.id::text,'NULL')||E', '||coalesce(new.field::text,'NULL')||E', '||coalesce(new.is_pk::text,'NULL')||E', '||coalesce(quote_literal(new.before),'NULL')||E', '||coalesce(quote_literal(new.after),'NULL')||E' )'; return null; end;$BODY$ LANGUAGE plpgsql VOLATILE; ALTER FUNCTION tgf_ins_audet() SET search_path=auditoria; ALTER FUNCTION tgf_ins_audet() OWNER TO audit; GRANT EXECUTE ON FUNCTION tgf_ins_audet() TO audit; CREATE OR REPLACE FUNCTION tgf_ins_audit() RETURNS trigger AS $BODY$ begin execute E'insert into audit_' || tg_argv[0] || E' ( id, type, tstmp, schema, table, user, client_inet, client_port, pid ) values ( '||coalesce(new.id::text,'NULL')||E', '||coalesce(quote_literal(new.type),'NULL')||E', '||coalesce(quote_literal(new.tstmp),'NULL')||E', '||coalesce(new.schema::text,'NULL')||E', '||coalesce(new.table::text,'NULL')||E', '||coalesce(new.user::text,'NULL')||E', '||coalesce(new.client_inet::text,'NULL')||E', '||coalesce(new.client_port::text,'NULL')||E', '||coalesce(new.pid::text,'NULL')||E' )'; return null; end;$BODY$ LANGUAGE plpgsql VOLATILE; ALTER FUNCTION tgf_ins_audit() SET search_path=auditoria; ALTER FUNCTION tgf_ins_audit() OWNER TO audit; GRANT EXECUTE ON FUNCTION tgf_ins_audit() TO audit; CREATE TRIGGER tg_audit_20110518 BEFORE INSERT ON audit FOR EACH ROW EXECUTE PROCEDURE tgf_ins_audit(20110518); CREATE TRIGGER tg_audet_20110907 BEFORE INSERT ON audet FOR EACH ROW EXECUTE PROCEDURE tgf_ins_audet(20110907); CREATE OR REPLACE FUNCTION rotate(character) RETURNS void AS $BODY$ /* Created by Diego Augusto Molina in 2011 for Tucuman Government, Argentina. */ declare first_execution boolean := false; cur_start char(8) := null; cur_tstmp_min timestamp with time zone; cur_tstmp_max timestamp with time zone; cur_id_min bigint; cur_id_max bigint; new_start char(8); begin /* Determine the creation tstmp of the tables ========================================================================= */ select substring(max(c.relname::text) from $1 || E'_(........)') into cur_start from pg_namespace n inner join pg_class c on (n.oid = c.relnamespace) where n.nspname = 'audit'::name and c.relname::text like $1 || '_%'; if cur_start is null then first_execution := true; cur_start := ''; end if; new_start := cast(to_char(current_timestamp,'YYYYMMDD') as name); case $1 when 'audit' then /* if I'm rotating the table audit ================================================================== */ /* current table */ if not first_execution then execute 'select min(tstmp), max(tstmp) from audit_' || cur_start into cur_tstmp_min, cur_tstmp_max; execute $$ alter index idx_audit_$$|| cur_start ||$$_id set (fillfactor = 100); alter index idx_audit_$$|| cur_start ||$$_tstmp set (fillfactor = 100); alter index idx_audit_$$|| cur_start ||$$_schema__table set (fillfactor = 100); alter index idx_audit_$$|| cur_start ||$$_user set (fillfactor = 100); cluster audit_$$|| cur_start ||$$; analyze audit_$$|| cur_start ||$$; alter table audit_$$|| cur_start ||$$ add constraint audit_$$|| cur_start ||$$_ck_exclusion check ( tstmp >= '$$|| cur_tstmp_min ||$$' and tstmp <= '$$|| cur_tstmp_max ||$$' ) $$; end if; execute $$ /* new table */ create table audit_$$|| new_start ||$$ () inherits (audit); create index idx_audit_$$|| new_start ||$$_id on audit_$$|| new_start ||$$ using btree (id) with (fillfactor = 99); create index idx_audit_$$|| new_start ||$$_tstmp on audit_$$|| new_start ||$$ using btree (tstmp) with (fillfactor = 99); create index idx_audit_$$|| new_start ||$$_schema__tabla on audit_$$|| new_start ||$$ using btree (schema, table) with (fillfactor = 95); create index idx_audit_$$|| new_start ||$$_user on audit_$$|| new_start ||$$ using btree (usuario) with (fillfactor = 95); cluster audit_$$|| new_start ||$$ using idx_audit_$$|| new_start ||$$_tstmp; /* Parent table */ drop trigger if exists tg_audit_$$|| cur_start ||$$ on audit; create trigger tg_audit_$$|| new_start ||$$ before insert on audit for each row execute procedure tgf_ins_audit('$$|| new_start ||$$'); $$; when 'audet' then /* if I'm rotating the table audet ================================================================== */ /* current table */ if not first_execution then execute 'select min(id), max(id) from audet_' || cur_start into cur_id_min, cur_id_max; execute $$ alter index idx_audet_$$|| cur_start ||$$_id set (fillfactor = 100); alter index idx_audet_$$|| cur_start ||$$_fieldpk set (fillfactor = 100); cluster audet_$$|| cur_start ||$$; analyze audet_$$|| cur_start ||$$; alter table audet_$$|| cur_start ||$$ add constraint audet_$$|| cur_start ||$$_ck_exclusion check ( id >= '$$|| cur_id_min ||$$' and id <= '$$|| cur_id_max ||$$' ); /* Parent table */ drop trigger tg_audet_$$|| cur_start ||$$ on audet; $$; end if; execute $$ /* new table */ create table audet_$$|| new_start ||$$ () inherits (audet); create index idx_audet_$$|| new_start ||$$_id on audet_$$|| new_start ||$$ using btree (id) with (fillfactor = 99); create index idx_audet_$$|| new_start ||$$_fieldpk on audet_$$|| new_start ||$$ using btree (field) with (fillfactor = 99) where es_pk; cluster audet_$$|| new_start ||$$ using idx_audet_$$|| new_start ||$$_id; /* Parent table */ create trigger tg_audet_$$|| new_start ||$$ before insert on audet for each row execute procedure tgf_ins_audet('$$|| new_start ||$$'); $$; else /* if I got a wrong argument ===================================================================================== */ raise notice E'Error: expected \'audit\' o \'audet\'. Got \'%\'.', $1; return; end case; end;$BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER; ALTER FUNCTION rotate(character) SET search_path=auditoria; ALTER FUNCTION rotate(character) OWNER TO audit; GRANT EXECUTE ON FUNCTION rotate(character) TO audit; CREATE OR REPLACE FUNCTION audit() RETURNS trigger AS $BODY$ ## Created by Diego Augusto Molina in 2011 for Tucuman Government, Argentina. ## This is the trigger which should be called by any table we want to audit. If it receives ## arguments, they will be interpreted as the primary key of the table. If any of the ## arguments is not a column of the table, or no arguments is received at all, a probing process ## is taken which ends up determining the pk of the table. Thus, it is better to create the trigger ## calling this function with no arguments. See the TODO list. ## Usage: ## CREATE TRIGGER <tg_name> AFTER {INSERT | UPDATE | DELETE} [ OR ...] ON <table_name> ## FOR EACH ROW EXECUTE PROCEDURE audit.audit( { <column_list> | <nothing_at_all> } ); ## KNOWN ISSUE #1: you don't want to use this trigger on a table which has a column of some type ## that doesn't have an implicit cast to string. That would cause a runtime error ## killing your transaction! See TODO #2. Be easy, most 'common' types have an ## implicit cast to string by default. ## TODO #1: In 'P3', instead of asking '( scalar keys %pk == 0 )' each time, put an 'else'. ## TODO #1.1: if the pk was not passed as argument, at the end of the probing execute an 'alter ## trigger' so that next time there's no probing at all. This would be unfriendly with ## modifications in the table definition, which should carry an update in the trigger ## (putting no arguments at all would imply probing again for the first time and then we ## just use it!). ## TODO #2: search for a way to save the binary contents of the columns instead of the formatted ## content. The table 'field' would have an extra column of type 'type' and that would ## help describing the field audited. That would solve the problem with strange fields ## (assuming that _any_ value can be converted to it's binary/internal representation). ## This may carry some extra complexity, maybe needing extra tables holding information ## about types. ## TODO #3: make this function receive only two parameters: two arrays of type name[], the first ## holding the set of columns which are the primary key, the second one is the set of ## columns which in addition to the pk one's are to be registered. Note that pk columns ## will always be registered because that will identify the tuple modified. ## TODO #4: support for TRUNCATE event. # P0. Declarations and general definitions ################################################### ############################################################################################## my $elog_pref = "(schm:$_TD->{table_schema};tab:$_TD->{table_name};trg:$_TD->{name};evt:$_TD->{event}):"; my $rv = ""; # Query execution my $val = ""; # Iterating value my %tables = ( # Value of the respective tables inserted in "audit" "user" => 'pg_catalog."session_user"()', "table" => "'$_TD->{table_name}'", "schema" => "'$_TD->{table_schema}'", "client_inet" => "pg_catalog.inet_client_addr()" ); my $id = ""; # Id of the tuple inserted in "audit" my $field = ""; # Field id my $is_pk = 0; # Determines if a field is part of the PK my $before = ""; # Value of a field in OLD my $after = ""; # Value of a field in NEW my %cols = (); # Columns of the table my %pk = (); # Primary key # Copy columns from some available transitional variable ------------------------------------- if (exists $_TD->{new}){ %cols = %{$_TD->{new}}; } else { %cols = %{$_TD->{old}}; } # P1. Create necessary tuples in user, table, schema and client_inet ######################### ############################################################################################## foreach $val (keys %tables){ $rv = spi_exec_query("select id from $val where value = $tables{$val}"); if ( $rv->{status} != SPI_OK_SELECT ){ elog(ERROR, "$elog_pref Error querying table '$val'."); } if ( $rv->{processed} == 1 ){ $tables{$val} = $rv->{rows}[0]->{id}; } else { $rv = spi_exec_query("insert into $val (value) values ($tables{$val}) returning id"); if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){ elog(ERROR, "$elog_pref Error inserting in table '$val'."); } $tables{$val} = $rv->{rows}[0]->{id}; } } # P2. Insert in audit ######################################################################## ############################################################################################## $rv = spi_exec_query("select nextval('seq_audit'::regclass) as id"); if ( $rv->{status} != SPI_OK_SELECT ){ elog(ERROR, "$elog_pref Error querying next value of sequence 'seq_audit'."); } $id = $rv->{rows}[0]->{id}; $rv = spi_exec_query("insert into audit (id, type, schema, table, user, client_inet) values ( $id, substring('$_TD->{event}', 1, 1), $tables{'schema'}, $tables{'table'}, $tables{'user'}, $tables{'client_inet'} ) "); if ($rv->{status} != SPI_OK_INSERT){ elog(ERROR,"$elog_pref Error inserting tuple in table 'audit'."); } # P3. Determine PK of the table ############################################################## ############################################################################################## if ( scalar keys %pk == 0){ # Criterion 1: if got params, each of them is a column of the table, and all of them make # up the pk of the table ------------------------------------------------------------------- elog(DEBUG, "$elog_pref Searching pk in the trigger's params."); if ($_TD->{argc} > 0){ ARGS: foreach $val ( @{$_TD->{args}} ){ if (exists $cols{$val}){ $pk{$val} = "-"; } else { %pk = (); elog(DEBUG, "$elog_pref The column '$val' given as argument does not exist. Skipping to next criterion."); last ARGS; } } } } if ( scalar keys %pk == 0 ) { # Criterion 2: search the pk in the system catalogs --------------------------------------- elog(DEBUG, "$elog_pref Searching pk in system catalogs."); $rv = spi_exec_query(" select a.attname from ( select cl.oid, unnest(c.conkey) as att from pg_catalog.pg_constraint c inner join pg_catalog.pg_class cl on (c.conrelid = cl.oid) where c.contype = 'p' and cl.oid = $_TD->{relid} ) as c inner join pg_catalog.pg_attribute a on (c.att = a.attnum and c.oid = a.attrelid) "); if ( $rv->{status} == SPI_OK_SELECT ){ if ( $rv->{processed} > 0 ){ foreach $val ($rv->{rows}){ $pk{$val->{attname}} = "-"; } } } else { elog(DEBUG, "$elog_pref Error querying the system catalogs. Skipping to next criterion."); } } if ( scalar keys %pk == 0) { # Criterion 3: if the table has OIDs, use that as pk and emit a warning ------------------- elog(DEBUG, "$elog_pref Searching OIDs in the table."); $rv = spi_exec_query("select * from pg_catalog.pg_class where oid = $_TD->{relid} and relhasoids = true"); if( $rv->{status} == SPI_OK_SELECT ){ if ( $rv->{processed} > 0 ){ %pk = ("oid","-"); elog(DEBUG, "$elog_pref Using OIDs as table pk for '$_TD->{table_name}' because no previous criterion could find one."); } } else { elog(DEBUG, "$elog_pref Error querying the system catalogs. Skipping to next criterion."); } } if ( scalar keys %pk == 0){ # Default criterion: all tuples ----------------------------------------------------------- elog(DEBUG, "$elog_pref Could not find a suitable pk. Logging every column."); %pk = %cols; } # P4. Insert in audet ######################################################################## ############################################################################################## foreach $val (keys %cols){ $is_pk = 0 + exists($pk{$val}); if ( $_TD->{event} ne "UPDATE" || $is_pk || $_TD->{new}{$val} ne $_TD->{old}{$val} ){ $before = (exists $_TD->{old}) ? "'".$_TD->{old}{$val}."'" : "NULL"; $after = (exists $_TD->{new}) ? "'".$_TD->{new}{$val}."'" : "NULL"; if ( $_TD->{event} eq "UPDATE" && $_TD->{new}{$val} eq $_TD->{old}{$val}){ # We don't save the previous state of the column which is part of the pk while updating # if it hasn't changed. $before = "NULL"; } $rv = spi_exec_query("select id from field where value = '$val'"); if ( $rv->{status} != SPI_OK_SELECT ){ elog(ERROR, "$elog_pref Error querying table 'field'."); } if ( $rv->{processed} > 0 ){ $field = $rv->{rows}[0]->{id}; } else { $rv = spi_exec_query("insert into field (value) values ('$val') returning id"); if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){ elog(ERROR, "$elog_pref Error executing insert returning in table 'field'."); } $field = $rv->{rows}[0]->{id}; } $rv = spi_exec_query("insert into audet (id, field, is_pk, before, after) values ($id, $field, cast($is_pk as boolean), cast($before as text), cast($after as text))"); if ( $rv->{status} ne SPI_OK_INSERT ){ elog(ERROR, "$elog_pref Error inserting tuples in table 'audet'."); } } } # P5. Finishing ############################################################################## ############################################################################################## return; $BODY$ LANGUAGE plperl VOLATILE SECURITY DEFINER; ALTER FUNCTION audit() SET search_path=auditoria; ALTER FUNCTION audit() OWNER TO audit; GRANT EXECUTE ON FUNCTION audit() TO public;
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
Sh*#@, the code was wrapped! if you can suggest me an external web to host the code I can put it there and send the link. Maybe the wiki....???? Nevertheless, I can't seem to publish in the wiki. I think I've followed all the steps but still can't. Anyway, there aren't many long-long lines, except for those whoch are comments. Honestly, I don't remember why I used triggers instead of rules in the "audit" and "audet" tables. -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>: > Honestly, I don't remember why I used triggers instead of rules in the > "audit" and "audet" tables. I remember now, that's because in my case, operations over tuples are done very lightly (one or two in the same sentence at a time). So, for a case as such, rules end up beeing more expensive than triggers (right?). -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>: > /* Created by Diego Augusto Molina in 2011 for Tucuman Government, > Argentina. */ OK, few random comments: *) when posting schema definitions, particularly in email format, try not to use dumped definitions from pg_dump or pgadmin. This creates a lot of noise in the script that detracts from what you are trying to do. Also an attached file would probably have been more appropriate. *) using smallint key for client_inet is really dubious. why not just use the inet itself? *) what is the audet table for? Are you truly storing a record for every field of every audited table? This will be incredibly efficient, especially for large, wide tables. *) surely, creating a table called 'table' is not a good idea. *) this approach obviously is a lot more complicated than hstore. however, for 8.4 and down, hstore won't work. but, what about just storing the record as text? *) I can't quite follow the perl criteron steps -- what is happening there? What are the loops doing? merlin
On Tue, Sep 27, 2011 at 04:52:08PM -0300, Diego Augusto Molina wrote: > 2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>: > > Honestly, I don't remember why I used triggers instead of rules in the > > "audit" and "audet" tables. > > I remember now, that's because in my case, operations over tuples are > done very lightly (one or two in the same sentence at a time). So, for > a case as such, rules end up beeing more expensive than triggers > (right?). There's an even better reason not to use rules: they're going away in a not too distant version of PostgreSQL. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
2011/9/27, Merlin Moncure <mmoncure@gmail.com>: > *) when posting schema definitions, particularly in email format, try > not to use dumped definitions from pg_dump or pgadmin. This creates a > lot of noise in the script that detracts from what you are trying to > do. Also an attached file would probably have been more appropriate. Right! I'm sending it attached and from the source files instead of the pgAdminIII dump. Well, that'll be the next mail, I don't have the files right now. > *) using smallint key for client_inet is really dubious. why not just > use the inet itself? Sure, this has two reasons: 1) I wanted the audit table to be as narrow as possible. Believe me, I migrated the tenths of millions of tuples from the previous (slow, inefficient, extremly difficult to maintain) logging system on a test server just to try things out: the narrower the table, the better it performs with searchs. And I mean it! You wouldn't imagine. I don't know what I did with the annotated results, but I will search for them to share that. 2) I put many things outside the audit table (like the table, schema and field names too); that makes it narrower but also makes it very easy to see all values without querying the audit table: I can see in a very small table all inet's from clients (and you could gather more info if you would like too). Note that for us most of the accesses to the database come from the web server which implements a new version of a big application, so it would be a pitty to allocate 5 extra bytes (7 bytes for IPv4, 2 for smallint) just to see mostly the same IP address. So, why bother logging the IP at all? well, besides adding completeness, it allows us to see if they were using the new application or the old one, which accessed directly to the database server from the client's computer. Other fields, namely client_port and pid, aren't mapped out to other tables because they do not increase too much the table width and because I wouldn't analyze those contents very often: like never, but may happen some time some kind of weird attack that needs to be analized with this data. > *) what is the audet table for? Are you truly storing a record for > every field of every audited table? This will be incredibly > efficient, especially for large, wide tables. See the answer about criterions. > *) surely, creating a table called 'table' is not a good idea. Ok, name it "tables" then. > *) this approach obviously is a lot more complicated than hstore. > however, for 8.4 and down, hstore won't work. but, what about just > storing the record as text? Third time lucky! see the next... > *) I can't quite follow the perl criteron steps -- what is happening > there? What are the loops doing? We have some very wide tables (like 20 or more columns). We only log the value of two kind of columns: i) those which make up the primary key of the table (which helps to track down the alteration); and ii) those whose values change in the event. Note that columns in group i) can also be in group ii) This carries the complexity of the criterions, which are meant to determine the primary key of the table at any cost. Each failing criterion makes the following one to take place. These are the criterions I could think of: 1) If we got parameters, _check_ them and consider each of them as one column of the primary key. This is the cheapest and almost way through. We really have to check, because if there's i.e. some typo the whole transaction outside the trigger would fail inconditionally together and we want this logging system to interfere as least as possible. A little less performance in exchange for some more stability. 2) Search in the system catalogs for a primary key constraint. 3) Search in the system catalogs for the unique constraint which has least columns (in fact, I think it should be "the narrowest unique constraint"). 4) If the table has OIDs, use that and emit a warning (that's never a real pk, unless you make an external unique index, which I don't have intentions to check right now). 5) The "else" (or "default") case is to log every column emiting a warning (Really guys, use primary keys! ;). We wouldn't bear with these complexity every time but only once: if criterion 1) fails, after determining the "primary key" we should execute a string which drops this same trigger and re-creates it passing it the names of the columns which were determined to be the pk so that the next time we don't go furher than 1). This works, I tried it out time ago but never did the change (oops!). I mean, dropping the trigger from the trigger itself (in the docs it says that 'alter trigger' can only rename it). For my case, all this head ache was needed: it was specified as one of the requirements of the logging system that every tuple should be trackable. Using a recursive "with" query, it is possible to track down the changes to any single tuple in the audited tables (or make a function, whatever). Fortunately, they never specified a maximum time for that ;). If instead we would have made a string from the record, we wouldn't have been able to easily track the tupple. Note also the "rotate(character)" function. It hot-rotates the audit/audet tables similarly to a logrotate program in *nix. At the same time, you never stop logging, and you can dump the old table before dropping it (to save space) and restore it somewhere else to exploit it. We would usually rotate each 6 months or so, but that is shortening each time with the growing system. The *real* system is very discreet with I/U/D operations: it's not usual to have more than one of those operations per second. For higher transactional systems I think this logging system would be more noticeable (right now things go smooth). As I said before, if you have test cases they're very welcome. -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)
From
Andrew Sullivan
Date:
On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote: > > There's an even better reason not to use rules: they're going away in > a not too distant version of PostgreSQL. Really? How? I thought views were done using rules under the hood? Also, it'd be awfully nice if, in case rules are going away, the documentation actually said prominently in the rules section, "Rules are deprecated and are planned to be removed at some point in the future." Right now, anyone coming to Postgres for the first time could easily understand the manual to say that Postgres has this cool feature on which they can rely. Best, A -- Andrew Sullivan ajs@crankycanuck.ca
Re: Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)
From
Tom Lane
Date:
Andrew Sullivan <ajs@crankycanuck.ca> writes: > On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote: >> There's an even better reason not to use rules: they're going away in >> a not too distant version of PostgreSQL. > Really? How? I thought views were done using rules under the hood? > Also, it'd be awfully nice if, in case rules are going away, the > documentation actually said prominently in the rules section, "Rules > are deprecated and are planned to be removed at some point in the > future." Right now, anyone coming to Postgres for the first time > could easily understand the manual to say that Postgres has this cool > feature on which they can rely. I think the true state of affairs is this: rules have a lot of surprising behaviors, and if we could think of something that works more straightforwardly, we'd love to replace them. But I think we'd have to have the "something" in place before we consider deprecating rules. At the moment we don't even have a glimmer of a design, so David's statement is many years premature. regards, tom lane
Re: Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)
From
Andrew Sullivan
Date:
On Wed, Sep 28, 2011 at 10:34:32AM -0400, Tom Lane wrote: > I think the true state of affairs is this: rules have a lot of > surprising behaviors, and if we could think of something that works more > straightforwardly, we'd love to replace them. Oh. Well, _that's_ not news :-) -- Andrew Sullivan ajs@crankycanuck.ca
On Wed, Sep 28, 2011 at 8:20 AM, Diego Augusto Molina <diegoaugustomolina@gmail.com> wrote: > 2011/9/27, Merlin Moncure <mmoncure@gmail.com>: >> *) when posting schema definitions, particularly in email format, try >> not to use dumped definitions from pg_dump or pgadmin. This creates a >> lot of noise in the script that detracts from what you are trying to >> do. Also an attached file would probably have been more appropriate. > > Right! I'm sending it attached and from the source files instead of > the pgAdminIII dump. Well, that'll be the next mail, I don't have the > files right now. > >> *) using smallint key for client_inet is really dubious. why not just >> use the inet itself? > > Sure, this has two reasons: > > 1) I wanted the audit table to be as narrow as possible. Believe me, I > migrated the tenths of millions of tuples from the previous (slow, > inefficient, extremly difficult to maintain) logging system on a test > server just to try things out: the narrower the table, the better it > performs with searchs. And I mean it! You wouldn't imagine. I don't > know what I did with the annotated results, but I will search for them > to share that. > > 2) I put many things outside the audit table (like the table, schema > and field names too); that makes it narrower but also makes it very > easy to see all values without querying the audit table: I can see in > a very small table all inet's from clients (and you could gather more > info if you would like too). Note that for us most of the accesses to > the database come from the web server which implements a new version > of a big application, so it would be a pitty to allocate 5 extra bytes > (7 bytes for IPv4, 2 for smallint) just to see mostly the same IP > address. So, why bother logging the IP at all? well, besides adding > completeness, it allows us to see if they were using the new > application or the old one, which accessed directly to the database > server from the client's computer. > > Other fields, namely client_port and pid, aren't mapped out to other > tables because they do not increase too much the table width and > because I wouldn't analyze those contents very often: like never, but > may happen some time some kind of weird attack that needs to be > analized with this data. > >> *) what is the audet table for? Are you truly storing a record for >> every field of every audited table? This will be incredibly >> efficient, especially for large, wide tables. > > See the answer about criterions. > >> *) surely, creating a table called 'table' is not a good idea. > > Ok, name it "tables" then. > >> *) this approach obviously is a lot more complicated than hstore. >> however, for 8.4 and down, hstore won't work. but, what about just >> storing the record as text? > > Third time lucky! see the next... > >> *) I can't quite follow the perl criteron steps -- what is happening >> there? What are the loops doing? > > We have some very wide tables (like 20 or more columns). We only log > the value of two kind of columns: i) those which make up the primary > key of the table (which helps to track down the alteration); and ii) > those whose values change in the event. Note that columns in group i) > can also be in group ii) > This carries the complexity of the criterions, which are meant to > determine the primary key of the table at any cost. Each failing > criterion makes the following one to take place. These are the > criterions I could think of: > 1) If we got parameters, _check_ them and consider each of them as one > column of the primary key. This is the cheapest and almost way > through. We really have to check, because if there's i.e. some typo > the whole transaction outside the trigger would fail inconditionally > together and we want this logging system to interfere as least as > possible. A little less performance in exchange for some more > stability. > 2) Search in the system catalogs for a primary key constraint. > 3) Search in the system catalogs for the unique constraint which has > least columns (in fact, I think it should be "the narrowest unique > constraint"). > 4) If the table has OIDs, use that and emit a warning (that's never a > real pk, unless you make an external unique index, which I don't have > intentions to check right now). > 5) The "else" (or "default") case is to log every column emiting a > warning (Really guys, use primary keys! ;). > We wouldn't bear with these complexity every time but only once: if > criterion 1) fails, after determining the "primary key" we should > execute a string which drops this same trigger and re-creates it > passing it the names of the columns which were determined to be the pk > so that the next time we don't go furher than 1). This works, I tried > it out time ago but never did the change (oops!). I mean, dropping the > trigger from the trigger itself (in the docs it says that 'alter > trigger' can only rename it). > > For my case, all this head ache was needed: it was specified as one of > the requirements of the logging system that every tuple should be > trackable. Using a recursive "with" query, it is possible to track > down the changes to any single tuple in the audited tables (or make a > function, whatever). Fortunately, they never specified a maximum time > for that ;). If instead we would have made a string from the record, > we wouldn't have been able to easily track the tupple. > > Note also the "rotate(character)" function. It hot-rotates the > audit/audet tables similarly to a logrotate program in *nix. At the > same time, you never stop logging, and you can dump the old table > before dropping it (to save space) and restore it somewhere else to > exploit it. We would usually rotate each 6 months or so, but that is > shortening each time with the growing system. > > The *real* system is very discreet with I/U/D operations: it's not > usual to have more than one of those operations per second. For higher > transactional systems I think this logging system would be more > noticeable (right now things go smooth). As I said before, if you have > test cases they're very welcome. Yup -- I get it now. Well, one point about this is that it seems directed towards your personal requirements. This is a very 'heavy' audit system that will not be suitable for high performance transactional systems. That said, it looks well thought out. Storing only the changed fields is pretty clever, but I can't help but wonder if you're better off using arrays for that purpose: create type audit_field_t as (field text, old_value text, new_value text); and inside the audit table itself having fields audit_field_t, and, if at all possible, constructing the array of audit fields in a single expression. This will be much more compact than one record per field -- normally, arrays in table definitions tend to be bad mojo but this is one case they could be useful. Audit records are WORM, 'Write Once Read Maybe', so compactness is important. Obviously, for 9.0+, I would be rigging a solution around hstore for an 'all sql' solution which is usually better if you can get away with it. merlin
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
2011/9/28, Merlin Moncure <mmoncure@gmail.com>: > Yup -- I get it now. Well, one point about this is that it seems > directed towards your personal requirements. This is a very 'heavy' > audit system that will not be suitable for high performance > transactional systems. That said, it looks well thought out. Storing > only the changed fields is pretty clever, but I can't help but wonder > if you're better off using arrays for that purpose: > > create type audit_field_t as (field text, old_value text, new_value text); > > and inside the audit table itself having > fields audit_field_t, > > and, if at all possible, constructing the array of audit fields in a > single expression. This will be much more compact than one record per > field -- normally, arrays in table definitions tend to be bad mojo but > this is one case they could be useful. Audit records are WORM, 'Write > Once Read Maybe', so compactness is important. Obviously, for 9.0+, > I would be rigging a solution around hstore for an 'all sql' solution > which is usually better if you can get away with it. > > merlin > Well that sounds pretty fair to me. But that flow would not allow me to make partial indexes on primary key fields. As you can see in the "audet" table, there's a column named "is_pk" which tells if that column was considered a primary key at the moment of the logging. Normally there's no indexes, but when I have to make some audits I do the following: 1) Dump the audits. 2) Restore somewhere else. 3) Generate some indexes on: timestamp, schema|table, field|is_pk and id (I think, I've got the procedure annotated too, but not here hehe). This indexing is a pain sometimes but even adding it to the time it takes to run one query it is really cheap. Making the indexes gets far more necessary if you run more than one query (which is probably the case). I had considered the solution you're posting, but it would get a _real_ pain to run a query with 'unnest's and 'array_agg's. Also, note that some of these may not be available in versions of PostgreSQL prior to 8.4 (I think), so if you're planning to track the tupple you won't be able to do it in clear (maybe using temp tables). But! all those arguments above get beat by only one you asserted: that "WORM" thing. You are defintly right about that. Logging in the majority of the cases should be meaningful, light to run, compact/compressed, and rotated so that it doesn't take up all your space with time. Having said that, I'm going to take your advice for the next version, which I hope that also checks some TODO's in the list. When I get home I'll send the current code attached and when I get some fresh air at work I'll make the changes and post the new version. Any other ideas for the new version? (get some previews in the TODO list at the top of the perl trigger function in the attachment of the next mail). -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
On 09/28/2011 08:34 AM, Tom Lane wrote: > Andrew Sullivan <ajs@crankycanuck.ca> writes: >> On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote: >>> There's an even better reason not to use rules: they're going away in >>> a not too distant version of PostgreSQL. >> Really? How? I thought views were done using rules under the hood? >> Also, it'd be awfully nice if, in case rules are going away, the >> documentation actually said prominently in the rules section, "Rules >> are deprecated and are planned to be removed at some point in the >> future." Right now, anyone coming to Postgres for the first time >> could easily understand the manual to say that Postgres has this cool >> feature on which they can rely. > I think the true state of affairs is this: rules have a lot of > surprising behaviors, and if we could think of something that works more > straightforwardly, we'd love to replace them. But I think we'd have to > have the "something" in place before we consider deprecating rules. > At the moment we don't even have a glimmer of a design, so David's > statement is many years premature. > > regards, tom lane > Yoda: "Damned by faint praise, these rules are" Would this future something more likely be a totally different concept or a re-implementation? Is there a list of "gotchas" w.r.t. rules?
On Wed, Sep 28, 2011 at 11:45 AM, Diego Augusto Molina <diegoaugustomolina@gmail.com> wrote: > 2011/9/28, Merlin Moncure <mmoncure@gmail.com>: >> Yup -- I get it now. Well, one point about this is that it seems >> directed towards your personal requirements. This is a very 'heavy' >> audit system that will not be suitable for high performance >> transactional systems. That said, it looks well thought out. Storing >> only the changed fields is pretty clever, but I can't help but wonder >> if you're better off using arrays for that purpose: >> >> create type audit_field_t as (field text, old_value text, new_value text); >> >> and inside the audit table itself having >> fields audit_field_t, >> >> and, if at all possible, constructing the array of audit fields in a >> single expression. This will be much more compact than one record per >> field -- normally, arrays in table definitions tend to be bad mojo but >> this is one case they could be useful. Audit records are WORM, 'Write >> Once Read Maybe', so compactness is important. Obviously, for 9.0+, >> I would be rigging a solution around hstore for an 'all sql' solution >> which is usually better if you can get away with it. >> >> merlin >> > > Well that sounds pretty fair to me. But that flow would not allow me > to make partial indexes on primary key fields. As you can see in the > "audet" table, there's a column named "is_pk" which tells if that > column was considered a primary key at the moment of the logging. > Normally there's no indexes, but when I have to make some audits I do > the following: > 1) Dump the audits. > 2) Restore somewhere else. > 3) Generate some indexes on: timestamp, schema|table, field|is_pk and > id (I think, I've got the procedure annotated too, but not here hehe). > This indexing is a pain sometimes but even adding it to the time it > takes to run one query it is really cheap. Making the indexes gets far > more necessary if you run more than one query (which is probably the > case). > I had considered the solution you're posting, but it would get a > _real_ pain to run a query with 'unnest's and 'array_agg's. Also, note > that some of these may not be available in versions of PostgreSQL > prior to 8.4 (I think), so if you're planning to track the tupple you > won't be able to do it in clear (maybe using temp tables). I disagree. unnest() and array_agg() (or, even better, array() constructor syntax) are an absolute joy to work with and thinking in a more functional way, which is usually the key to making things run quickly. Also both functions are trivial to emulate in userland for compatibility. Arrays of composites IIRC only go back to 8.3 so that would be a true stopper for any solution in that vein. As for the rest of it, I'd be looking to try and come up with an all sql implementation. Also you should give an honest comparison between what you've come up with vs. this: http://pgfoundry.org/projects/tablelog/. merlin
On Wed, Sep 28, 2011 at 10:53 AM, Rob Sargent <robjsargent@gmail.com> wrote: > > > On 09/28/2011 08:34 AM, Tom Lane wrote: >> Andrew Sullivan <ajs@crankycanuck.ca> writes: >>> On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote: >>>> There's an even better reason not to use rules: they're going away in >>>> a not too distant version of PostgreSQL. >>> Really? How? I thought views were done using rules under the hood? >>> Also, it'd be awfully nice if, in case rules are going away, the >>> documentation actually said prominently in the rules section, "Rules >>> are deprecated and are planned to be removed at some point in the >>> future." Right now, anyone coming to Postgres for the first time >>> could easily understand the manual to say that Postgres has this cool >>> feature on which they can rely. >> I think the true state of affairs is this: rules have a lot of >> surprising behaviors, and if we could think of something that works more >> straightforwardly, we'd love to replace them. But I think we'd have to >> have the "something" in place before we consider deprecating rules. >> At the moment we don't even have a glimmer of a design, so David's >> statement is many years premature. >> >> regards, tom lane >> > Yoda: "Damned by faint praise, these rules are" > > Would this future something more likely be a totally different concept > or a re-implementation? > > Is there a list of "gotchas" w.r.t. rules? yes, and it is huge -- see the archives. note that views are based on rules (SELECT rules) and there is nothing wrong there -- so it's not quite correct to say they will be going away completely. view triggers removed the #1 thing that most people wanted to do with rules, namely updatable views. the basic problem with rules is that by hacking the sql you send to the server, you lose control over various aspects of the statement that normally are fairly predictable. they are almost impossible to get working properly -- I've tried many times. aside from the multiple evaluation thing, you have bizarre interactions with many sql features that came in later (like RETURNING). folks, don't use RULES! use triggers -- and as much as possible, keep triggers simple, short, and to the point (simple validation, custom RI, auditing/logging, etc). merlin
Hi, > folks, don't use RULES! use triggers -- and as much as possible, keep > triggers simple, short, and to the point (simple validation, custom > RI, auditing/logging, etc). I like them :). 'DO INSTEAD' rules are great for partitioning so you can insert (or update) to parent table and 'DO INSTEAD' rule takes care about the rest. -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
> -----Original Message----- > From: Ondrej Ivanič [mailto:ondrej.ivanic@gmail.com] > Sent: Wednesday, September 28, 2011 6:47 PM > To: pgsql-general@postgresql.org > Subject: Re: Rules going away > > Hi, > > > folks, don't use RULES! use triggers -- and as much as possible, keep > > triggers simple, short, and to the point (simple validation, custom > > RI, auditing/logging, etc). > > I like them :). 'DO INSTEAD' rules are great for partitioning so you > can insert (or update) to parent table and 'DO INSTEAD' rule takes > care about the rest. > > -- > Ondrej Ivanic > (ondrej.ivanic@gmail.com) Years ago (PG version 8.2), I was working on implementing partitioned tables, and evaluated whether to use triggers or rulesto "automagically" redirect inserts/updates/delete from main table to appropriate partition based on the value of partitioningkey. Unfortunately, I didn't document my research, but the result was that triggers were unusable for my purposes, and I endedup with the rules calling functions where using dynamic sql I decide which partition should be used for insert/update/delete(at least on "selects" PG takes care of choosing proper partition, of course with properly set configurationparameter). These rules (and functions they call) work perfectly for me all these years in many, many, many installations. So, until PG takes care of the problem I'm solving with the rules (implementing "complete" partitioning feature, and notjust on "selects"), please leave rules where they are, they definitely have their use when properly implemented with specificpurpose. Regards, Igor Neyman
speaking of DO INSTEAD, for insert/update case. Try using RETURNING with that and rules ;) Good luck
> -----Original Message----- > From: Gregg Jaskiewicz [mailto:gryzman@gmail.com] > Sent: Friday, September 30, 2011 5:18 AM > To: Igor Neyman > Cc: Ondrej Ivanič; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Rules going away > > speaking of DO INSTEAD, for insert/update case. Try using RETURNING > with that and rules ;) Good luck Well, I designed it for specific case. And in my application I had no need for RETURNING clause. That's why I'm saying, it works perfectly for me, but I'm not trying to generalize, like those who say "Rules are evil anddon't ever use them". Regards, Igor Neyman
You're right, rules are perfect for very limited and narrow cases. And make it very hard to write complicated queries against. (i.e., updates that only touch few columns, likewise with inserts). I'm guessing the upside is that rules are faster then triggers.
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
2011/9/28, Merlin Moncure <mmoncure@gmail.com>: > > I disagree. unnest() and array_agg() (or, even better, array() > constructor syntax) are an absolute joy to work with and thinking in a > more functional way, which is usually the key to making things run > quickly. Also both functions are trivial to emulate in userland for > compatibility. Arrays of composites IIRC only go back to 8.3 so that > would be a true stopper for any solution in that vein. Ok, tastes are tastes: I hate to make two or three more levels of subqueries. Regarding arrays of composites, that would be perfectly solved if we use no composite at all! Instead of a field with an array of a composite of three instrinsics, use three fields, each of an intrinsic type. See your proposal: >> create type audit_field_t as (field text, old_value text, new_value text); Instad, in the audit table you may use: ..., field smallint[], before text[], after text[],... Note the smallint in field, that means I really want to keep the reference to the "field" table. That is for the two reasons I had mentioned earlier (to reduce space: 2 bytes of type "smallint" against variable size of type "text"; and to keep track of names been used too). You can also set up something like this if you like dimensions: ..., field smallint[], values text[][],... Implying that the first dimension is the "before" value and the second one is for the "after" value. Any of these prevents us from using composites and makes the box a little wider and simpler. Even further, I would like to keep the logging "on demand": ..., field smallint[], is_pk boolean[], { before text[], after text[] | values text[][] },... You know what are the braces and pipe for... So, at the end, we have the entire "audet" table inside the "audit" table, as a series of arrays. We got a real compact table with only enough data to fully log the changes which triggered the event. No less, no more. At this point we know querying this table will be much more slow and rotation will have to be done more frequently. If we dump>restore the table somewhere else we will still be able to split the table in the original two ones, and make indexes, cluster them, and query as desired. But this can get so complicated that maybe I should implement a function doing all this. In an event, we are getting less responsiveness because of this. But a couple of mins more may not be a problem for most cases. I'm just trying to summarize. As a rule of thumb, you may need to run a cron job every night or so to check if 'select count(*) from audit' is bigger than X then rotate the table (or maybe each X days/weeks/etc.). The smaller the X, the bigger responsiveness _in_ some cases: if we know an interval in time we will just have to dump>restore those logs. In other cases this would not be of much help: if you need to track a tupple to the very beggining of the times, you'll have a lot of work to do dumping>restoring (and so forth... remember to split the table, indexing...). Still, rotation seems to be a good practice, and you can include in the cron job the dump/restore part into another server and then delete the old table. That would save a lot of space in your production environment. > As for the rest of it, I'd be looking to try and come up with an all > sql implementation. Also you should give an honest comparison between > what you've come up with vs. this: > http://pgfoundry.org/projects/tablelog/. > > merlin > "All SQL implementation"? Didn't we agree that's not possible in pg<=8.4? then what do you mean by that? About project "tablelog", I didn't really try it, but read it's documentation and seems not appropiate at all for my case. First of all, it's propose seems to be to log everything in a table to be able to restore it later as of any time in the past. My propose is to log to run analysis. Also, it needs to create one table per logged table, consisting of the same structure of the logged table (without constraints) plus three, four or five columns for control (depending on usage, four or five recommended). I have a lot of tables to log (hundreds!) with small changes to each of them; that means to duplicate the amount of tables for a few changes. Speaking of compactness... It also logs everything, not only changed values. It is written in C, so I assume it runs much, much faster (specially needed for highly transactional DBs). But it's not proven to be binary safe (which I don't remember what that is). Bugs: nothing known. So, if you need to be able to restore your table as of any time, use tablelog. If you need to run analysis on who did what, use my option. Finally attaching the code! Cheers. -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
Attachment
Re: [Solved] Generic logging system for pre-hstore using plperl triggers
From
Diego Augusto Molina
Date:
Some errata and round up comments. First of all, I think it's relevant to give some info about the system I'm running: OS: Linux Distro: Gentoo Perl: dev-lang/perl-5.12.3-r1 USE="-* berkdb gdbm" Gentoo Base System release: 2.0.2 PORTDIR/metadata/timestamp.chk: Tue, 12 Jul 2011 00:45:01 +0000 2011/10/2, Diego Augusto Molina <diegoaugustomolina@gmail.com>: > solved if we use no composite at all! Instead of a field with an array > of a composite of three instrinsics, use three fields, each of an > intrinsic type. See your proposal: Errata: "use three fields, each *one being an array of an intrinsic type.*" > mentioned earlier (to reduce space: 2 bytes of type "smallint" against > variable size of type "text"; and to keep track of names been used > too). You can also set up something like this if you like dimensions: Errata: "and to keep track of names *being* used too)" > Implying that the first dimension is the "before" value and the second > one is for the "after" value. Any of these prevents us from using > composites and makes the box a little wider and simpler. Even further, > I would like to keep the logging "on demand": Comment on <<I would like to keep the logging "on demand":>> With this I mean to log only those columns wich are part of the primary key or whose values have changed -- Diego Augusto Molina diegoaugustomolina@gmail.com ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán desestimados. EN: Please, avoid attaching Microsoft Office documents. They shall be discarded. LINK: http://www.gnu.org/philosophy/no-word-attachments.html
Hi all, I've been using RULES for some time now, and despite the fact, that I always had to make "experiments" to have the expected results, it worked for me just fine. Now I have this simple scenario: --------------------------------------------------- mbr2=# CREATE TEMP TABLE test(a int, b date); CREATE TABLE mbr2=# INSERT INTO test (a) values (22); INSERT 0 1 mbr2=# CREATE TEMP VIEW testing AS SELECT * from test; CREATE VIEW mbr2=# SELECT * from testing; a | b ----+--- 22 | (1 row) mbr2=# CREATE RULE chg_testing_1 AS ON UPDATE TO testing where new.b is not null do instead select '+++'; CREATE RULE mbr2=# CREATE RULE chg_testing_2 AS ON UPDATE TO testing do instead select '==='; CREATE RULE mbr2=# UPDATE testing set a=44; ?column? ---------- === (1 row) UPDATE 0 mbr2=# UPDATE testing set a=44, b=current_date; ?column? ---------- === (1 row) UPDATE 0 mbr2=# ------------------------------------------ And I'm completely lost how to interpret the outcome: 1. I know, that RULEs countrary to TRIGGERs "rewrite" query, resulting in actual execution of a different query from originally scheduled 2. while triggers "take" rows of processed data "and do their job there". Now, the above example does not behave as I would expect. I really don't understand why both of the above updates give the same result here and it boils down to: how come a RULE can have a look at "NEW.B", if it's supposed to be rewritten *before* any rows are fetched for processing? When exactly does it "have a look" at the NEW.B? How do I ensure, that the "DO INSTEAD NOTHING" rule *does*not* get executted when a conditional rule does? Can someone give a little explenation here ... or point me to "for dummies" documentation? (yes, I've been though postgres docs, to no avail). I'll apreciate that, really. -R
On Mon, Oct 10, 2011 at 04:06:34PM +0200, Rafal Pietrak wrote: > Hi all, first of all - why did you send this mail as reply to some 2-weeks old thread, instead of just start of new thread? > Can someone give a little explenation here ... or point me to "for > dummies" documentation? (yes, I've been though postgres docs, to no > avail). perhaps this will help: http://www.depesz.com/index.php/2010/06/15/to-rule-or-not-to-rule-that-is-the-question/ Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/
On Mon, 2011-10-10 at 16:48 +0200, hubert depesz lubaczewski wrote: > On Mon, Oct 10, 2011 at 04:06:34PM +0200, Rafal Pietrak wrote: > > Hi all, > > first of all - why did you send this mail as reply to some 2-weeks old > thread, instead of just start of new thread? Sorry for that. Old habits... always forget that. > > > Can someone give a little explenation here ... or point me to "for > > dummies" documentation? (yes, I've been though postgres docs, to no > > avail). > > perhaps this will help: > > http://www.depesz.com/index.php/2010/06/15/to-rule-or-not-to-rule-that-is-the-question/ > Hmmm. not really. Yet, more puzzles are there, so may be they'll guide me to the answer, eventualy. One thing I've spotted there, is that earlier I've naively assumed, that when I define a RULE INSTEAD, the original query is "discarded" on the final rewritten query. The example found at your link shows that it isn't ... which is *extremally* strange, but somehow explains what I get in my set of rules "supposedly" exclusive. ... or may be this "theory" is also wrong :( Anyway, thenx for the link. -R