Thread: [Solved] Generic logging system for pre-hstore using plperl triggers

[Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

26 September 2011, 20:39:17

Hi, I had to implement a logging system for some DBs in work. It's
generic and implemented using plperl. I've seen tons of mails on the
list from newbies asking for something like this using plpgsql, but no
specific solution is pointed for them. I think this may discourage
some of them.
The system is implemented using a separate schema with whatever name
you want, and has some really nice features: relevant tables can be
rotated to facilitate auditing, each logged action refers to the
modified tuple by pk, which you don't even have to (but can) specify
when reating the trigger, and some more. Performance is very
acceptable (test cases and suggestions are welcome), and never had a
problem since the about 8 months it's been working.
In the wiki some points are mentioned, but all is _too_ general and
for a vanilla pg-8.4 you won't have the hstore facility (note that
upgrading is not always a choice).
Will it worth pasting the little code in here or that'll be ignored?

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Filip Rembiałkowski

Date:

26 September 2011, 21:20:14

2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>

Hi, I had to implement a logging system for some DBs in work. It's
generic and implemented using plperl. I've seen tons of mails on the
list from newbies asking for something like this using plpgsql, but no
specific solution is pointed for them. I think this may discourage
some of them.
The system is implemented using a separate schema with whatever name
you want, and has some really nice features: relevant tables can be
rotated to facilitate auditing, each logged action refers to the
modified tuple by pk, which you don't even have to (but can) specify
when reating the trigger, and some more. Performance is very
acceptable (test cases and suggestions are welcome), and never had a
problem since the about 8 months it's been working.
In the wiki some points are mentioned, but all is _too_ general and
for a vanilla pg-8.4 you won't have the hstore facility (note that
upgrading is not always a choice).
Will it worth pasting the little code in here or that'll be ignored?

If you can share it - do it, maybe someone will have use of your work - it's always welcome.
Publishing it on the web and linking here instead of pasting - will be even better.

Filip

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

27 September 2011, 16:41:22

        /* Created by Diego Augusto Molina in 2011 for Tucuman Government,
Argentina. */

/*
    -- Execute the following accordingly to your needs.
  CREATE TRUSTED PROCEDURAL LANGUAGE 'plpgsql';
  CREATE TRUSTED PROCEDURAL LANGUAGE 'plperl';
*/

CREATE ROLE auditor NOSUPERUSER NOINHERIT NOCREATEDB NOCREATEROLE;
CREATE ROLE audit LOGIN ENCRYPTED PASSWORD 'test.1234' NOSUPERUSER
NOINHERIT NOCREATEDB NOCREATEROLE;
CREATE SCHEMA audit AUTHORIZATION audit;
ALTER ROLE auditor SET search_path=audit;
ALTER ROLE audit SET search_path=audit;
SET search_path=audit;
SET SESSION AUTHORIZATION audit;

CREATE SEQUENCE seq_audit
  INCREMENT 1
  MINVALUE -9223372036854775808
  MAXVALUE 9223372036854775807
  START 0
  CACHE 1
  CYCLE;
ALTER TABLE seq_audit OWNER TO audit;

CREATE SEQUENCE seq_elems
  INCREMENT 1
  MINVALUE -32768
  MAXVALUE 32767
  START 0
  CACHE 1
  CYCLE;
ALTER TABLE seq_elems OWNER TO audit;


CREATE TABLE field
(
  id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
  value name NOT NULL,
  CONSTRAINT field_pk PRIMARY KEY (id)
  WITH (FILLFACTOR=100),
  CONSTRAINT field_uq_value UNIQUE (value)
  WITH (FILLFACTOR=100)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE field OWNER TO audit;
GRANT ALL ON TABLE field TO audit;
GRANT SELECT ON TABLE field TO auditor;

CREATE TABLE client_inet
(
  id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
  value inet NOT NULL DEFAULT inet_client_addr(),
  CONSTRAINT dir_inet_pk PRIMARY KEY (id )
  WITH (FILLFACTOR=100),
  CONSTRAINT dir_inet_uq_value UNIQUE (value)
  WITH (FILLFACTOR=95)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE client_inet
  OWNER TO audit;
GRANT ALL ON TABLE client_inet TO audit;
GRANT SELECT ON TABLE client_inet TO auditor;

CREATE TABLE schema
(
  id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
  value name NOT NULL,
  CONSTRAINT schema_pk PRIMARY KEY (id )
  WITH (FILLFACTOR=100),
  CONSTRAINT schema_uq_value UNIQUE (value )
  WITH (FILLFACTOR=100)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE schema
  OWNER TO audit;
GRANT ALL ON TABLE schema TO audit;
GRANT SELECT ON TABLE schema TO auditor;

CREATE TABLE table
(
  id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
  value name NOT NULL,
  CONSTRAINT table_pk PRIMARY KEY (id )
  WITH (FILLFACTOR=100),
  CONSTRAINT table_uq_value UNIQUE (value )
  WITH (FILLFACTOR=100)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE table
  OWNER TO audit;
GRANT ALL ON TABLE table TO audit;
GRANT SELECT ON TABLE table TO auditor;

CREATE TABLE user
(
  id smallint NOT NULL DEFAULT nextval('seq_elems'::regclass),
  value name NOT NULL DEFAULT "current_user"(),
  CONSTRAINT user_pk PRIMARY KEY (id )
  WITH (FILLFACTOR=100),
  CONSTRAINT user_uq_value UNIQUE (value )
  WITH (FILLFACTOR=95)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE user
  OWNER TO audit;
GRANT ALL ON TABLE user TO audit;
GRANT SELECT ON TABLE user TO auditor;

CREATE TABLE audit
(
  id bigint,
  type character(1),
  tstmp timestamp with time zone DEFAULT now(),
  schema smallint,
  table smallint,
  user smallint,
  client_inet smallint,
  client_port integer DEFAULT inet_client_port(),
  pid integer DEFAULT pg_backend_pid()
)
WITH (
  OIDS=FALSE
);
ALTER TABLE audit OWNER TO audit;
GRANT ALL ON TABLE audit TO audit;
GRANT SELECT ON TABLE audit TO auditor;

CREATE TABLE audet
(
  id bigint,
  field smallint,
  is_pk boolean,
  before text,
  after text
)
WITH (
  OIDS=FALSE
);
ALTER TABLE audet OWNER TO audit;
GRANT ALL ON TABLE audet TO audit;
GRANT SELECT ON TABLE audet TO auditor;

CREATE OR REPLACE FUNCTION tgf_ins_audet()
  RETURNS trigger AS
$BODY$
    begin
      execute E'insert into audet_' || tg_argv[0] || E'
        (
          id,
          field,
          is_pk,
          before,
          after
        ) values
        (
          '||coalesce(new.id::text,'NULL')||E',
          '||coalesce(new.field::text,'NULL')||E',
          '||coalesce(new.is_pk::text,'NULL')||E',
          '||coalesce(quote_literal(new.before),'NULL')||E',
          '||coalesce(quote_literal(new.after),'NULL')||E'
        )';
      return null;
  end;$BODY$ LANGUAGE plpgsql VOLATILE;
ALTER FUNCTION tgf_ins_audet() SET search_path=auditoria;
ALTER FUNCTION tgf_ins_audet() OWNER TO audit;
GRANT EXECUTE ON FUNCTION tgf_ins_audet() TO audit;

CREATE OR REPLACE FUNCTION tgf_ins_audit()
  RETURNS trigger AS
$BODY$
    begin
      execute E'insert into audit_' || tg_argv[0] || E'
        (
          id,
          type,
          tstmp,
          schema,
          table,
          user,
          client_inet,
          client_port,
          pid
        ) values
        (
          '||coalesce(new.id::text,'NULL')||E',
          '||coalesce(quote_literal(new.type),'NULL')||E',
          '||coalesce(quote_literal(new.tstmp),'NULL')||E',
          '||coalesce(new.schema::text,'NULL')||E',
          '||coalesce(new.table::text,'NULL')||E',
          '||coalesce(new.user::text,'NULL')||E',
          '||coalesce(new.client_inet::text,'NULL')||E',
          '||coalesce(new.client_port::text,'NULL')||E',
          '||coalesce(new.pid::text,'NULL')||E'
        )';
      return null;
  end;$BODY$ LANGUAGE plpgsql VOLATILE;
ALTER FUNCTION tgf_ins_audit() SET search_path=auditoria;
ALTER FUNCTION tgf_ins_audit() OWNER TO audit;
GRANT EXECUTE ON FUNCTION tgf_ins_audit() TO audit;

CREATE TRIGGER tg_audit_20110518
  BEFORE INSERT
  ON audit
  FOR EACH ROW
  EXECUTE PROCEDURE tgf_ins_audit(20110518);

CREATE TRIGGER tg_audet_20110907
  BEFORE INSERT
  ON audet
  FOR EACH ROW
  EXECUTE PROCEDURE tgf_ins_audet(20110907);

CREATE OR REPLACE FUNCTION rotate(character)
  RETURNS void AS
$BODY$
        /* Created by Diego Augusto Molina in 2011 for Tucuman Government,
Argentina. */
    declare
      first_execution boolean := false;
      cur_start char(8) := null;
      cur_tstmp_min timestamp with time zone;
      cur_tstmp_max timestamp with time zone;
      cur_id_min bigint;
      cur_id_max bigint;
      new_start  char(8);
    begin
        /* Determine the creation tstmp of the tables
=========================================================================
*/
      select substring(max(c.relname::text) from $1 || E'_(........)')
into cur_start
        from
          pg_namespace n inner join
          pg_class c on (n.oid = c.relnamespace)
        where
          n.nspname = 'audit'::name and
          c.relname::text like $1 || '_%';
      if cur_start is null then
        first_execution := true;
        cur_start := '';
      end if;
      new_start := cast(to_char(current_timestamp,'YYYYMMDD') as name);

      case $1
        when 'audit' then /* if I'm rotating the table audit
================================================================== */
            /* current table */
          if not first_execution then
            execute 'select min(tstmp), max(tstmp) from audit_' || cur_start
              into cur_tstmp_min, cur_tstmp_max;
            execute $$
              alter index idx_audit_$$|| cur_start ||$$_id
set (fillfactor = 100);
              alter index idx_audit_$$|| cur_start ||$$_tstmp
set (fillfactor = 100);
              alter index idx_audit_$$|| cur_start ||$$_schema__table
set (fillfactor = 100);
              alter index idx_audit_$$|| cur_start ||$$_user
set (fillfactor = 100);

              cluster audit_$$|| cur_start ||$$;
              analyze audit_$$|| cur_start ||$$;
              alter table audit_$$|| cur_start ||$$ add
                constraint audit_$$|| cur_start ||$$_ck_exclusion
                check (
                  tstmp >= '$$|| cur_tstmp_min ||$$'
                    and
                  tstmp <= '$$|| cur_tstmp_max ||$$'
                )
              $$;
          end if;

          execute $$
              /* new table */
            create table audit_$$|| new_start ||$$ () inherits (audit);
            create index idx_audit_$$|| new_start ||$$_id
on audit_$$|| new_start ||$$ using btree (id)            with
(fillfactor = 99);
            create index idx_audit_$$|| new_start ||$$_tstmp
on audit_$$|| new_start ||$$ using btree (tstmp)         with
(fillfactor = 99);
            create index idx_audit_$$|| new_start ||$$_schema__tabla
on audit_$$|| new_start ||$$ using btree (schema, table) with
(fillfactor = 95);
            create index idx_audit_$$|| new_start ||$$_user
on audit_$$|| new_start ||$$ using btree (usuario)       with
(fillfactor = 95);
            cluster audit_$$|| new_start ||$$ using idx_audit_$$||
new_start ||$$_tstmp;

              /* Parent table */
            drop trigger if exists tg_audit_$$|| cur_start ||$$ on audit;
            create trigger tg_audit_$$|| new_start ||$$
              before insert
              on audit
              for each row
              execute procedure tgf_ins_audit('$$|| new_start ||$$');
          $$;
        when 'audet' then /* if I'm rotating the table audet
================================================================== */
            /* current table */
          if not first_execution then
            execute 'select min(id), max(id) from audet_' || cur_start
              into cur_id_min, cur_id_max;
            execute $$
              alter index idx_audet_$$|| cur_start ||$$_id   set
(fillfactor = 100);
              alter index idx_audet_$$|| cur_start ||$$_fieldpk set
(fillfactor = 100);

              cluster audet_$$|| cur_start ||$$;
              analyze audet_$$|| cur_start ||$$;
              alter table audet_$$|| cur_start ||$$ add
                constraint audet_$$|| cur_start ||$$_ck_exclusion
                check (
                  id >= '$$|| cur_id_min ||$$'
                    and
                  id <= '$$|| cur_id_max ||$$'
                );

                /* Parent table */
              drop trigger tg_audet_$$|| cur_start ||$$ on audet;
            $$;
          end if;

          execute $$
              /* new table */
            create table audet_$$|| new_start ||$$ () inherits (audet);
            create index idx_audet_$$|| new_start ||$$_id      on
audet_$$|| new_start ||$$ using btree (id) with (fillfactor = 99);
            create index idx_audet_$$|| new_start ||$$_fieldpk on
audet_$$|| new_start ||$$ using btree (field) with (fillfactor = 99)
where es_pk;
            cluster audet_$$|| new_start ||$$ using idx_audet_$$||
new_start ||$$_id;

              /* Parent table */
            create trigger tg_audet_$$|| new_start ||$$
              before insert
              on audet
              for each row
              execute procedure tgf_ins_audet('$$|| new_start ||$$');
          $$;
        else /* if I got a wrong argument
=====================================================================================
*/
          raise notice E'Error: expected \'audit\' o \'audet\'. Got \'%\'.', $1;
          return;
      end case;
  end;$BODY$ LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
ALTER FUNCTION rotate(character) SET search_path=auditoria;
ALTER FUNCTION rotate(character) OWNER TO audit;
GRANT EXECUTE ON FUNCTION rotate(character) TO audit;

CREATE OR REPLACE FUNCTION audit() RETURNS trigger AS $BODY$

## Created by Diego Augusto Molina in 2011 for Tucuman Government, Argentina.

##  This is the trigger which should be called by any table we want to
audit. If it receives
## arguments, they will be interpreted as the primary key of the
table. If any of the
## arguments is not a column of the table, or no arguments is received
at all, a probing process
## is taken which ends up determining the pk of the table. Thus, it is
better to create the trigger
## calling this function with no arguments. See the TODO list.

## Usage:
##   CREATE TRIGGER <tg_name> AFTER {INSERT | UPDATE | DELETE} [ OR
...] ON <table_name>
##     FOR EACH ROW EXECUTE PROCEDURE audit.audit( { <column_list> |
<nothing_at_all> } );

## KNOWN ISSUE #1: you don't want to use this trigger on a table which
has a column of some type
##                 that doesn't have an implicit cast to string. That
would cause a runtime error
##                 killing your transaction! See TODO #2. Be easy,
most 'common' types have an
##                 implicit cast to string by default.

## TODO #1: In 'P3', instead of asking '( scalar keys %pk == 0 )' each
time, put an 'else'.
## TODO #1.1: if the pk was not passed as argument, at the end of the
probing execute an 'alter
##          trigger' so that next time there's no probing at all. This
would be unfriendly with
##          modifications in the table definition, which should carry
an update in the trigger
##          (putting no arguments at all would imply probing again for
the first time and then we
##          just use it!).
## TODO #2: search for a way to save the binary contents of the
columns instead of the formatted
##          content. The table 'field' would have an extra column of
type 'type' and that would
##          help describing the field audited. That would solve the
problem with strange fields
##          (assuming that _any_ value can be converted to it's
binary/internal representation).
##          This may carry some extra complexity, maybe needing extra
tables holding information
##          about types.
## TODO #3: make this function receive only two parameters: two arrays
of type name[], the first
##          holding the set of columns which are the primary key, the
second one is the set of
##          columns which in addition to the pk one's are to be
registered. Note that pk columns
##          will always be registered because that will identify the
tuple modified.
## TODO #4: support for TRUNCATE event.

      # P0. Declarations and general definitions
###################################################
      ##############################################################################################

    my $elog_pref =
"(schm:$_TD->{table_schema};tab:$_TD->{table_name};trg:$_TD->{name};evt:$_TD->{event}):";
    my $rv = "";                            # Query execution
    my $val = "";                           # Iterating value
    my %tables = (                          # Value of the respective
tables inserted in "audit"
      "user" => 'pg_catalog."session_user"()',
      "table" => "'$_TD->{table_name}'",
      "schema" => "'$_TD->{table_schema}'",
      "client_inet" => "pg_catalog.inet_client_addr()"
    );
    my $id = "";                            # Id of the tuple inserted
in "audit"
    my $field = "";                         # Field id
    my $is_pk = 0;                          # Determines if a field is
part of the PK
    my $before = "";                        # Value of a field in OLD
    my $after = "";                         # Value of a field in NEW
    my %cols = ();                          # Columns of the table
    my %pk = ();                            # Primary key

      # Copy columns from some available transitional variable
-------------------------------------
    if (exists $_TD->{new}){
      %cols = %{$_TD->{new}};
    } else {
      %cols = %{$_TD->{old}};
    }

      # P1. Create necessary tuples in user, table, schema and
client_inet #########################
      ##############################################################################################

    foreach $val (keys %tables){
      $rv = spi_exec_query("select id from $val where value = $tables{$val}");
      if ( $rv->{status} != SPI_OK_SELECT ){
        elog(ERROR, "$elog_pref Error querying table '$val'.");
      }
      if ( $rv->{processed} == 1 ){
        $tables{$val} = $rv->{rows}[0]->{id};
      } else {
        $rv = spi_exec_query("insert into $val (value) values
($tables{$val}) returning id");
        if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){
          elog(ERROR, "$elog_pref Error inserting in table '$val'.");
        }
        $tables{$val} = $rv->{rows}[0]->{id};
      }
    }

      # P2. Insert in audit
########################################################################
      ##############################################################################################

    $rv = spi_exec_query("select nextval('seq_audit'::regclass) as id");
    if ( $rv->{status} != SPI_OK_SELECT ){
      elog(ERROR, "$elog_pref Error querying next value of sequence
'seq_audit'.");
    }
    $id = $rv->{rows}[0]->{id};
    $rv = spi_exec_query("insert into audit (id, type, schema, table,
user, client_inet)
        values (
          $id,
          substring('$_TD->{event}', 1, 1),
          $tables{'schema'},
          $tables{'table'},
          $tables{'user'},
          $tables{'client_inet'}
          )
      ");
    if ($rv->{status} != SPI_OK_INSERT){
      elog(ERROR,"$elog_pref Error inserting tuple in table 'audit'.");
    }

      # P3. Determine PK of the table
##############################################################
      ##############################################################################################

    if ( scalar keys %pk == 0){
        #  Criterion 1: if got params, each of them is a column of the
table, and all of them make
        # up the pk of the table
-------------------------------------------------------------------
        elog(DEBUG, "$elog_pref Searching pk in the trigger's params.");
      if ($_TD->{argc} > 0){
        ARGS: foreach $val ( @{$_TD->{args}} ){
          if (exists $cols{$val}){
            $pk{$val} = "-";
          } else {
            %pk = ();
            elog(DEBUG, "$elog_pref The column '$val' given as
argument does not exist. Skipping to next criterion.");
            last ARGS;
          }
        }
      }
    }
    if ( scalar keys %pk == 0 ) {
        #  Criterion 2: search the pk in the system catalogs
---------------------------------------
      elog(DEBUG, "$elog_pref Searching pk in system catalogs.");
      $rv = spi_exec_query("
        select a.attname from
          ( select cl.oid, unnest(c.conkey) as att
            from
              pg_catalog.pg_constraint c inner join
              pg_catalog.pg_class cl on (c.conrelid = cl.oid)
            where
              c.contype = 'p'
              and cl.oid = $_TD->{relid}
          ) as c inner join
          pg_catalog.pg_attribute a on (c.att = a.attnum and c.oid = a.attrelid)
      ");
      if ( $rv->{status} == SPI_OK_SELECT ){
        if ( $rv->{processed} > 0 ){
          foreach $val ($rv->{rows}){
            $pk{$val->{attname}} = "-";
          }
        }
      } else {
        elog(DEBUG, "$elog_pref Error querying the system catalogs.
Skipping to next criterion.");
      }
    }
    if ( scalar keys %pk == 0) {
        #  Criterion 3: if the table has OIDs, use that as pk and emit
a warning -------------------
        elog(DEBUG, "$elog_pref Searching OIDs in the table.");
        $rv = spi_exec_query("select * from pg_catalog.pg_class where
oid = $_TD->{relid} and relhasoids = true");
        if( $rv->{status} == SPI_OK_SELECT ){
          if ( $rv->{processed} > 0 ){
            %pk = ("oid","-");
            elog(DEBUG, "$elog_pref Using OIDs as table pk for
'$_TD->{table_name}' because no previous criterion could find one.");
          }
        } else {
          elog(DEBUG, "$elog_pref Error querying the system catalogs.
Skipping to next criterion.");
        }
      }
    if ( scalar keys %pk == 0){
        #  Default criterion: all tuples
-----------------------------------------------------------
      elog(DEBUG, "$elog_pref Could not find a suitable pk. Logging
every column.");
      %pk = %cols;
    }

      # P4. Insert in audet
########################################################################
      ##############################################################################################

    foreach $val (keys %cols){
      $is_pk = 0 + exists($pk{$val});
      if ( $_TD->{event} ne "UPDATE" || $is_pk || $_TD->{new}{$val} ne
$_TD->{old}{$val} ){
        $before = (exists $_TD->{old}) ? "'".$_TD->{old}{$val}."'" : "NULL";
        $after  = (exists $_TD->{new}) ? "'".$_TD->{new}{$val}."'" : "NULL";
        if ( $_TD->{event} eq "UPDATE" && $_TD->{new}{$val} eq
$_TD->{old}{$val}){
            #   We don't save the previous state of the column which
is part of the pk while updating
            # if it hasn't changed.
          $before = "NULL";
        }
        $rv = spi_exec_query("select id from field where value = '$val'");
        if ( $rv->{status} != SPI_OK_SELECT ){
          elog(ERROR, "$elog_pref Error querying table 'field'.");
        }
        if ( $rv->{processed} > 0 ){
          $field = $rv->{rows}[0]->{id};
        } else {
          $rv = spi_exec_query("insert into field (value) values
('$val') returning id");
          if ( $rv->{status} != SPI_OK_INSERT_RETURNING ){
            elog(ERROR, "$elog_pref Error executing insert returning
in table 'field'.");
          }
          $field = $rv->{rows}[0]->{id};
        }
        $rv = spi_exec_query("insert into audet (id, field, is_pk,
before, after)
          values ($id, $field, cast($is_pk as boolean), cast($before
as text), cast($after as text))");
        if ( $rv->{status} ne SPI_OK_INSERT ){
          elog(ERROR, "$elog_pref Error inserting tuples in table 'audet'.");
        }
      }
    }

      # P5. Finishing
##############################################################################
      ##############################################################################################

    return;

  $BODY$
  LANGUAGE plperl VOLATILE SECURITY DEFINER;
ALTER FUNCTION audit() SET search_path=auditoria;
ALTER FUNCTION audit() OWNER TO audit;
GRANT EXECUTE ON FUNCTION audit() TO public;

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

27 September 2011, 16:47:39

Sh*#@, the code was wrapped! if you can suggest me an external web to
host the code I can put it there and send the link.

Maybe the wiki....????
Nevertheless, I can't seem to publish in the wiki. I think I've
followed all the steps but still can't.

Anyway, there aren't many long-long lines, except for those whoch are comments.

Honestly, I don't remember why I used triggers instead of rules in the
"audit" and "audet" tables.

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

27 September 2011, 16:52:16

2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>:
> Honestly, I don't remember why I used triggers instead of rules in the
> "audit" and "audet" tables.

I remember now, that's because in my case, operations over tuples are
done very lightly (one or two in the same sentence at a time). So, for
a case as such, rules end up beeing more expensive than triggers
(right?).

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Merlin Moncure

Date:

27 September 2011, 17:25:08

2011/9/27 Diego Augusto Molina <diegoaugustomolina@gmail.com>:
>                /* Created by Diego Augusto Molina in 2011 for Tucuman Government,
> Argentina. */



OK, few random comments:
*) when posting schema definitions, particularly in email format, try
not to use dumped definitions from pg_dump or pgadmin. This creates a
lot of noise in the script that detracts from what you are trying to
do.   Also an attached file would probably have been more appropriate.

*) using smallint key for client_inet is really dubious.  why not just
use the inet itself?

*) what is the audet table for?  Are you truly storing a record for
every field of every audited table?  This will be incredibly
efficient, especially for large, wide tables.

*) surely, creating a table called 'table' is not a good idea.

*) this approach obviously is a lot more complicated than hstore.
however, for 8.4 and down, hstore won't work. but, what about just
storing the record as text?

*) I can't quite follow the perl criteron steps -- what is happening
there?  What are the loops doing?

merlin

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

David Fetter

Date:

28 September 2011, 10:20:12

On Tue, Sep 27, 2011 at 04:52:08PM -0300, Diego Augusto Molina wrote:
> 2011/9/27, Diego Augusto Molina <diegoaugustomolina@gmail.com>:
> > Honestly, I don't remember why I used triggers instead of rules in the
> > "audit" and "audet" tables.
>
> I remember now, that's because in my case, operations over tuples are
> done very lightly (one or two in the same sentence at a time). So, for
> a case as such, rules end up beeing more expensive than triggers
> (right?).

There's an even better reason not to use rules: they're going away in
a not too distant version of PostgreSQL.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

28 September 2011, 10:20:15

2011/9/27, Merlin Moncure <mmoncure@gmail.com>:
> *) when posting schema definitions, particularly in email format, try
> not to use dumped definitions from pg_dump or pgadmin. This creates a
> lot of noise in the script that detracts from what you are trying to
> do.   Also an attached file would probably have been more appropriate.

Right! I'm sending it attached and from the source files instead of
the pgAdminIII dump. Well, that'll be the next mail, I don't have the
files right now.

> *) using smallint key for client_inet is really dubious.  why not just
> use the inet itself?

Sure, this has two reasons:

1) I wanted the audit table to be as narrow as possible. Believe me, I
migrated the tenths of millions of tuples from the previous (slow,
inefficient, extremly difficult to maintain) logging system on a test
server just to try things out: the narrower the table, the better it
performs with searchs. And I mean it! You wouldn't imagine. I don't
know what I did with the annotated results, but I will search for them
to share that.

2) I put many things outside the audit table (like the table, schema
and field names too); that makes it narrower but also makes it very
easy to see all values without querying the audit table: I can see in
a very small table all inet's from clients (and you could gather more
info if you would like too). Note that for us most of the accesses to
the database come from the web server which implements a new version
of a big application, so it would be a pitty to allocate 5 extra bytes
(7 bytes for IPv4, 2 for smallint) just to see mostly the same IP
address. So, why bother logging the IP at all? well, besides adding
completeness, it allows us to see if they were using the new
application or the old one, which accessed directly to the database
server from the client's computer.

Other fields, namely client_port and pid, aren't mapped out to other
tables because they do not increase too much the table width and
because I wouldn't analyze those contents very often: like never, but
may happen some time some kind of weird attack that needs to be
analized with this data.

> *) what is the audet table for?  Are you truly storing a record for
> every field of every audited table?  This will be incredibly
> efficient, especially for large, wide tables.

See the answer about criterions.

> *) surely, creating a table called 'table' is not a good idea.

Ok, name it "tables" then.

> *) this approach obviously is a lot more complicated than hstore.
> however, for 8.4 and down, hstore won't work. but, what about just
> storing the record as text?

Third time lucky! see the next...

> *) I can't quite follow the perl criteron steps -- what is happening
> there?  What are the loops doing?

We have some very wide tables (like 20 or more columns). We only log
the value of two kind of columns: i) those which make up the primary
key of the table (which helps to track down the alteration); and ii)
those whose values change in the event. Note that columns in group i)
can also be in group ii)
This carries the complexity of the criterions, which are meant to
determine the primary key of the table at any cost. Each failing
criterion makes the following one to take place. These are the
criterions I could think of:
1) If we got parameters, _check_ them and consider each of them as one
column of the primary key. This is the cheapest and almost way
through. We really have to check, because if there's i.e. some typo
the whole transaction outside the trigger would fail inconditionally
together and we want this logging system to interfere as least as
possible. A little less performance in exchange for some more
stability.
2) Search in the system catalogs for a primary key constraint.
3) Search in the system catalogs for the unique constraint which has
least columns (in fact, I think it should be "the narrowest unique
constraint").
4) If the table has OIDs, use that and emit a warning (that's never a
real pk, unless you make an external unique index, which I don't have
intentions to check right now).
5) The "else" (or "default") case is to log every column emiting a
warning (Really guys, use primary keys! ;).
We wouldn't bear with these complexity every time but only once: if
criterion 1) fails, after determining the "primary key" we should
execute a string which drops this same trigger and re-creates it
passing it the names of the columns which were determined to be the pk
so that the next time we don't go furher than 1). This works, I tried
it out time ago but never did the change (oops!). I mean, dropping the
trigger from the trigger itself (in the docs it says that 'alter
trigger' can only rename it).

For my case, all this head ache was needed: it was specified as one of
the requirements of the logging system that every tuple should be
trackable. Using a recursive "with" query, it is possible to track
down the changes to any single tuple in the audited tables (or make a
function, whatever). Fortunately, they never specified a maximum time
for that ;). If instead we would have made a string from the record,
we wouldn't have been able to easily track the tupple.

Note also the "rotate(character)" function. It hot-rotates the
audit/audet tables similarly to a logrotate program in *nix. At the
same time, you never stop logging, and you can dump the old table
before dropping it (to save space) and restore it somewhere else to
exploit it. We would usually rotate each 6 months or so, but that is
shortening each time with the growing system.

The *real* system is very discreet with I/U/D operations: it's not
usual to have more than one of those operations per second. For higher
transactional systems I think this logging system would be more
noticeable (right now things go smooth). As I said before, if you have
test cases they're very welcome.

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)

From

Andrew Sullivan

Date:

28 September 2011, 11:11:23

On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
>
> There's an even better reason not to use rules: they're going away in
> a not too distant version of PostgreSQL.

Really?  How?  I thought views were done using rules under the hood?

Also, it'd be awfully nice if, in case rules are going away, the
documentation actually said prominently in the rules section, "Rules
are deprecated and are planned to be removed at some point in the
future."  Right now, anyone coming to Postgres for the first time
could easily understand the manual to say that Postgres has this cool
feature on which they can rely.

Best,

A

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)

From

Tom Lane

Date:

28 September 2011, 11:34:40

Andrew Sullivan <ajs@crankycanuck.ca> writes:
> On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
>> There's an even better reason not to use rules: they're going away in
>> a not too distant version of PostgreSQL.

> Really?  How?  I thought views were done using rules under the hood?

> Also, it'd be awfully nice if, in case rules are going away, the
> documentation actually said prominently in the rules section, "Rules
> are deprecated and are planned to be removed at some point in the
> future."  Right now, anyone coming to Postgres for the first time
> could easily understand the manual to say that Postgres has this cool
> feature on which they can rely.

I think the true state of affairs is this: rules have a lot of
surprising behaviors, and if we could think of something that works more
straightforwardly, we'd love to replace them.  But I think we'd have to
have the "something" in place before we consider deprecating rules.
At the moment we don't even have a glimmer of a design, so David's
statement is many years premature.

            regards, tom lane

Re: Rules going away (was: [Solved] Generic logging system for pre-hstore using plperl triggers)

From

Andrew Sullivan

Date:

28 September 2011, 11:49:39

On Wed, Sep 28, 2011 at 10:34:32AM -0400, Tom Lane wrote:
> I think the true state of affairs is this: rules have a lot of
> surprising behaviors, and if we could think of something that works more
> straightforwardly, we'd love to replace them.

Oh.  Well, _that's_ not news :-)

--
Andrew Sullivan
ajs@crankycanuck.ca

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Merlin Moncure

Date:

28 September 2011, 12:05:30

On Wed, Sep 28, 2011 at 8:20 AM, Diego Augusto Molina
<diegoaugustomolina@gmail.com> wrote:
> 2011/9/27, Merlin Moncure <mmoncure@gmail.com>:
>> *) when posting schema definitions, particularly in email format, try
>> not to use dumped definitions from pg_dump or pgadmin. This creates a
>> lot of noise in the script that detracts from what you are trying to
>> do.   Also an attached file would probably have been more appropriate.
>
> Right! I'm sending it attached and from the source files instead of
> the pgAdminIII dump. Well, that'll be the next mail, I don't have the
> files right now.
>
>> *) using smallint key for client_inet is really dubious.  why not just
>> use the inet itself?
>
> Sure, this has two reasons:
>
> 1) I wanted the audit table to be as narrow as possible. Believe me, I
> migrated the tenths of millions of tuples from the previous (slow,
> inefficient, extremly difficult to maintain) logging system on a test
> server just to try things out: the narrower the table, the better it
> performs with searchs. And I mean it! You wouldn't imagine. I don't
> know what I did with the annotated results, but I will search for them
> to share that.
>
> 2) I put many things outside the audit table (like the table, schema
> and field names too); that makes it narrower but also makes it very
> easy to see all values without querying the audit table: I can see in
> a very small table all inet's from clients (and you could gather more
> info if you would like too). Note that for us most of the accesses to
> the database come from the web server which implements a new version
> of a big application, so it would be a pitty to allocate 5 extra bytes
> (7 bytes for IPv4, 2 for smallint) just to see mostly the same IP
> address. So, why bother logging the IP at all? well, besides adding
> completeness, it allows us to see if they were using the new
> application or the old one, which accessed directly to the database
> server from the client's computer.
>
> Other fields, namely client_port and pid, aren't mapped out to other
> tables because they do not increase too much the table width and
> because I wouldn't analyze those contents very often: like never, but
> may happen some time some kind of weird attack that needs to be
> analized with this data.
>
>> *) what is the audet table for?  Are you truly storing a record for
>> every field of every audited table?  This will be incredibly
>> efficient, especially for large, wide tables.
>
> See the answer about criterions.
>
>> *) surely, creating a table called 'table' is not a good idea.
>
> Ok, name it "tables" then.
>
>> *) this approach obviously is a lot more complicated than hstore.
>> however, for 8.4 and down, hstore won't work. but, what about just
>> storing the record as text?
>
> Third time lucky! see the next...
>
>> *) I can't quite follow the perl criteron steps -- what is happening
>> there?  What are the loops doing?
>
> We have some very wide tables (like 20 or more columns). We only log
> the value of two kind of columns: i) those which make up the primary
> key of the table (which helps to track down the alteration); and ii)
> those whose values change in the event. Note that columns in group i)
> can also be in group ii)
> This carries the complexity of the criterions, which are meant to
> determine the primary key of the table at any cost. Each failing
> criterion makes the following one to take place. These are the
> criterions I could think of:
> 1) If we got parameters, _check_ them and consider each of them as one
> column of the primary key. This is the cheapest and almost way
> through. We really have to check, because if there's i.e. some typo
> the whole transaction outside the trigger would fail inconditionally
> together and we want this logging system to interfere as least as
> possible. A little less performance in exchange for some more
> stability.
> 2) Search in the system catalogs for a primary key constraint.
> 3) Search in the system catalogs for the unique constraint which has
> least columns (in fact, I think it should be "the narrowest unique
> constraint").
> 4) If the table has OIDs, use that and emit a warning (that's never a
> real pk, unless you make an external unique index, which I don't have
> intentions to check right now).
> 5) The "else" (or "default") case is to log every column emiting a
> warning (Really guys, use primary keys! ;).
> We wouldn't bear with these complexity every time but only once: if
> criterion 1) fails, after determining the "primary key" we should
> execute a string which drops this same trigger and re-creates it
> passing it the names of the columns which were determined to be the pk
> so that the next time we don't go furher than 1). This works, I tried
> it out time ago but never did the change (oops!). I mean, dropping the
> trigger from the trigger itself (in the docs it says that 'alter
> trigger' can only rename it).
>
> For my case, all this head ache was needed: it was specified as one of
> the requirements of the logging system that every tuple should be
> trackable. Using a recursive "with" query, it is possible to track
> down the changes to any single tuple in the audited tables (or make a
> function, whatever). Fortunately, they never specified a maximum time
> for that ;). If instead we would have made a string from the record,
> we wouldn't have been able to easily track the tupple.
>
> Note also the "rotate(character)" function. It hot-rotates the
> audit/audet tables similarly to a logrotate program in *nix. At the
> same time, you never stop logging, and you can dump the old table
> before dropping it (to save space) and restore it somewhere else to
> exploit it. We would usually rotate each 6 months or so, but that is
> shortening each time with the growing system.
>
> The *real* system is very discreet with I/U/D operations: it's not
> usual to have more than one of those operations per second. For higher
> transactional systems I think this logging system would be more
> noticeable (right now things go smooth). As I said before, if you have
> test cases they're very welcome.


Yup -- I get it now.  Well, one point about this is that it seems
directed towards your personal requirements. This is a very 'heavy'
audit system that will not be suitable for high performance
transactional systems.  That said, it looks well thought out.  Storing
only the changed fields is pretty clever, but I can't help but wonder
if you're better off using arrays for that purpose:

create type audit_field_t as (field text, old_value text, new_value text);

and inside the audit table itself having
  fields audit_field_t,

and, if at all possible, constructing the array of audit fields in a
single expression.  This will be much more compact than one record per
field -- normally, arrays in table definitions tend to be bad mojo but
this is one case they could be useful.  Audit records are WORM, 'Write
Once Read Maybe', so compactness is important.   Obviously, for 9.0+,
I would be rigging a solution around hstore for an 'all sql' solution
which is usually better if you can get away with it.

merlin

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

28 September 2011, 13:45:20

2011/9/28, Merlin Moncure <mmoncure@gmail.com>:
> Yup -- I get it now.  Well, one point about this is that it seems
> directed towards your personal requirements. This is a very 'heavy'
> audit system that will not be suitable for high performance
> transactional systems.  That said, it looks well thought out.  Storing
> only the changed fields is pretty clever, but I can't help but wonder
> if you're better off using arrays for that purpose:
>
> create type audit_field_t as (field text, old_value text, new_value text);
>
> and inside the audit table itself having
>   fields audit_field_t,
>
> and, if at all possible, constructing the array of audit fields in a
> single expression.  This will be much more compact than one record per
> field -- normally, arrays in table definitions tend to be bad mojo but
> this is one case they could be useful.  Audit records are WORM, 'Write
> Once Read Maybe', so compactness is important.   Obviously, for 9.0+,
> I would be rigging a solution around hstore for an 'all sql' solution
> which is usually better if you can get away with it.
>
> merlin
>

Well that sounds pretty fair to me. But that flow would not allow me
to make partial indexes on primary key fields. As you can see in the
"audet" table, there's a column named "is_pk" which tells if that
column was considered a primary key at the moment of the logging.
Normally there's no indexes, but when I have to make some audits I do
the following:
1) Dump the audits.
2) Restore somewhere else.
3) Generate some indexes on: timestamp, schema|table, field|is_pk and
id (I think, I've got the procedure annotated too, but not here hehe).
This indexing is a pain sometimes but even adding it to the time it
takes to run one query it is really cheap. Making the indexes gets far
more necessary if you run more than one query (which is probably the
case).
I had considered the solution you're posting, but it would get a
_real_ pain to run a query with 'unnest's and 'array_agg's. Also, note
that some of these may not be available in versions of PostgreSQL
prior to 8.4 (I think), so if you're planning to track the tupple you
won't be able to do it in clear (maybe using temp tables).

But! all those arguments above get beat by only one you asserted: that
"WORM" thing. You are defintly right about that. Logging in the
majority of the cases should be meaningful, light to run,
compact/compressed, and rotated so that it doesn't take up all your
space with time.

Having said that, I'm going to take your advice for the next version,
which I hope that also checks some TODO's in the list. When I get home
I'll send the current code attached and when I get some fresh air at
work I'll make the changes and post the new version.

Any other ideas for the new version? (get some previews in the TODO
list at the top of the perl trigger function in the attachment of the
next mail).

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Re: Rules going away

From

Rob Sargent

Date:

28 September 2011, 14:41:37


On 09/28/2011 08:34 AM, Tom Lane wrote:
> Andrew Sullivan <ajs@crankycanuck.ca> writes:
>> On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
>>> There's an even better reason not to use rules: they're going away in
>>> a not too distant version of PostgreSQL.
>> Really?  How?  I thought views were done using rules under the hood?
>> Also, it'd be awfully nice if, in case rules are going away, the
>> documentation actually said prominently in the rules section, "Rules
>> are deprecated and are planned to be removed at some point in the
>> future."  Right now, anyone coming to Postgres for the first time
>> could easily understand the manual to say that Postgres has this cool
>> feature on which they can rely.
> I think the true state of affairs is this: rules have a lot of
> surprising behaviors, and if we could think of something that works more
> straightforwardly, we'd love to replace them.  But I think we'd have to
> have the "something" in place before we consider deprecating rules.
> At the moment we don't even have a glimmer of a design, so David's
> statement is many years premature.
>
>             regards, tom lane
>
Yoda: "Damned by faint praise, these rules are"

Would this future something more likely be a totally different concept
or a re-implementation?

Is there a list of "gotchas" w.r.t. rules?

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Merlin Moncure

Date:

28 September 2011, 15:20:34

On Wed, Sep 28, 2011 at 11:45 AM, Diego Augusto Molina
<diegoaugustomolina@gmail.com> wrote:
> 2011/9/28, Merlin Moncure <mmoncure@gmail.com>:
>> Yup -- I get it now.  Well, one point about this is that it seems
>> directed towards your personal requirements. This is a very 'heavy'
>> audit system that will not be suitable for high performance
>> transactional systems.  That said, it looks well thought out.  Storing
>> only the changed fields is pretty clever, but I can't help but wonder
>> if you're better off using arrays for that purpose:
>>
>> create type audit_field_t as (field text, old_value text, new_value text);
>>
>> and inside the audit table itself having
>>   fields audit_field_t,
>>
>> and, if at all possible, constructing the array of audit fields in a
>> single expression.  This will be much more compact than one record per
>> field -- normally, arrays in table definitions tend to be bad mojo but
>> this is one case they could be useful.  Audit records are WORM, 'Write
>> Once Read Maybe', so compactness is important.   Obviously, for 9.0+,
>> I would be rigging a solution around hstore for an 'all sql' solution
>> which is usually better if you can get away with it.
>>
>> merlin
>>
>
> Well that sounds pretty fair to me. But that flow would not allow me
> to make partial indexes on primary key fields. As you can see in the
> "audet" table, there's a column named "is_pk" which tells if that
> column was considered a primary key at the moment of the logging.
> Normally there's no indexes, but when I have to make some audits I do
> the following:
> 1) Dump the audits.
> 2) Restore somewhere else.
> 3) Generate some indexes on: timestamp, schema|table, field|is_pk and
> id (I think, I've got the procedure annotated too, but not here hehe).
> This indexing is a pain sometimes but even adding it to the time it
> takes to run one query it is really cheap. Making the indexes gets far
> more necessary if you run more than one query (which is probably the
> case).
> I had considered the solution you're posting, but it would get a
> _real_ pain to run a query with 'unnest's and 'array_agg's. Also, note
> that some of these may not be available in versions of PostgreSQL
> prior to 8.4 (I think), so if you're planning to track the tupple you
> won't be able to do it in clear (maybe using temp tables).

I disagree.  unnest() and array_agg() (or, even better, array()
constructor syntax) are an absolute joy to work with and thinking in a
more functional way, which is usually the key to making things run
quickly.  Also both functions are trivial to emulate in userland for
compatibility.  Arrays of composites IIRC only go back to 8.3 so  that
would be a true stopper for any solution in that vein.

As for the rest of it, I'd be looking to try and come up with an all
sql implementation.  Also you should give an honest comparison between
what you've come up with vs. this:
http://pgfoundry.org/projects/tablelog/.

merlin

Re: Rules going away

From

Merlin Moncure

Date:

28 September 2011, 17:37:20

On Wed, Sep 28, 2011 at 10:53 AM, Rob Sargent <robjsargent@gmail.com> wrote:
>
>
> On 09/28/2011 08:34 AM, Tom Lane wrote:
>> Andrew Sullivan <ajs@crankycanuck.ca> writes:
>>> On Wed, Sep 28, 2011 at 06:20:04AM -0700, David Fetter wrote:
>>>> There's an even better reason not to use rules: they're going away in
>>>> a not too distant version of PostgreSQL.
>>> Really?  How?  I thought views were done using rules under the hood?
>>> Also, it'd be awfully nice if, in case rules are going away, the
>>> documentation actually said prominently in the rules section, "Rules
>>> are deprecated and are planned to be removed at some point in the
>>> future."  Right now, anyone coming to Postgres for the first time
>>> could easily understand the manual to say that Postgres has this cool
>>> feature on which they can rely.
>> I think the true state of affairs is this: rules have a lot of
>> surprising behaviors, and if we could think of something that works more
>> straightforwardly, we'd love to replace them.  But I think we'd have to
>> have the "something" in place before we consider deprecating rules.
>> At the moment we don't even have a glimmer of a design, so David's
>> statement is many years premature.
>>
>>                       regards, tom lane
>>
> Yoda: "Damned by faint praise, these rules are"
>
> Would this future something more likely be a totally different concept
> or a re-implementation?
>
> Is there a list of "gotchas" w.r.t. rules?

yes, and it is huge -- see the archives. note that views are based on
rules (SELECT rules) and there is nothing wrong there -- so it's not
quite correct to say they will be going away completely.

view triggers removed the #1 thing that most people wanted to do with
rules, namely updatable views.  the basic problem with rules is that
by hacking the sql you send to the server, you lose control over
various aspects of the statement that normally are fairly predictable.
 they are almost impossible to get working properly -- I've tried many
times.  aside from the multiple evaluation thing, you have bizarre
interactions with many sql features that came in later (like
RETURNING).

folks, don't use RULES! use triggers -- and as much as possible, keep
triggers simple, short, and to the point (simple validation, custom
RI, auditing/logging, etc).

merlin

Re: Rules going away

From

Ondrej Ivanič

Date:

28 September 2011, 19:46:43

Hi,

> folks, don't use RULES! use triggers -- and as much as possible, keep
> triggers simple, short, and to the point (simple validation, custom
> RI, auditing/logging, etc).

I like them :). 'DO INSTEAD' rules are great for partitioning so you
can insert (or update) to parent table and 'DO INSTEAD' rule takes
care about the rest.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: Rules going away

From

"Igor Neyman"

Date:

29 September 2011, 14:51:25

> -----Original Message-----
> From: Ondrej Ivanič [mailto:ondrej.ivanic@gmail.com]
> Sent: Wednesday, September 28, 2011 6:47 PM
> To: pgsql-general@postgresql.org
> Subject: Re: Rules going away
> 
> Hi,
> 
> > folks, don't use RULES! use triggers -- and as much as possible, keep
> > triggers simple, short, and to the point (simple validation, custom
> > RI, auditing/logging, etc).
> 
> I like them :). 'DO INSTEAD' rules are great for partitioning so you
> can insert (or update) to parent table and 'DO INSTEAD' rule takes
> care about the rest.
> 
> --
> Ondrej Ivanic
> (ondrej.ivanic@gmail.com)

Years ago (PG version 8.2), I was working on implementing partitioned tables, and evaluated whether to use triggers or
rulesto "automagically" redirect inserts/updates/delete from main table to appropriate partition based on the value of
partitioningkey.
 

Unfortunately, I didn't document my research, but the result was that triggers were unusable for my purposes, and I
endedup with the rules calling functions where using dynamic sql I decide which partition should be used for
insert/update/delete(at least on "selects" PG takes care of choosing proper partition, of course with properly set
configurationparameter).
 

These rules (and functions they call) work perfectly for me all these years in many, many, many installations.
So, until PG takes care of the problem I'm solving with the rules (implementing "complete" partitioning feature, and
notjust on "selects"), please leave rules where they are, they definitely have their use when properly implemented with
specificpurpose.
 

Regards,
Igor Neyman

Re: Rules going away

From

Gregg Jaskiewicz

Date:

30 September 2011, 06:18:15

speaking of DO INSTEAD, for insert/update case. Try using RETURNING
with that and rules ;) Good luck

Re: Rules going away

From

"Igor Neyman"

Date:

30 September 2011, 09:35:07

> -----Original Message-----
> From: Gregg Jaskiewicz [mailto:gryzman@gmail.com]
> Sent: Friday, September 30, 2011 5:18 AM
> To: Igor Neyman
> Cc: Ondrej Ivanič; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Rules going away
> 
> speaking of DO INSTEAD, for insert/update case. Try using RETURNING
> with that and rules ;) Good luck

Well, I designed it for specific case.
And in my application I had no need for RETURNING clause.
That's why I'm saying, it works perfectly for me, but I'm not trying to generalize, like those who say "Rules are evil
anddon't ever use them".
 

Regards,
Igor Neyman

Re: Rules going away

From

Gregg Jaskiewicz

Date:

30 September 2011, 09:40:59

You're right, rules are perfect for very limited and narrow cases. And
make it very hard to write complicated queries against. (i.e., updates
that only touch few columns, likewise with inserts).
I'm guessing the upside is that rules are faster then triggers.

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

02 October 2011, 14:55:35

2011/9/28, Merlin Moncure <mmoncure@gmail.com>:
>
> I disagree.  unnest() and array_agg() (or, even better, array()
> constructor syntax) are an absolute joy to work with and thinking in a
> more functional way, which is usually the key to making things run
> quickly.  Also both functions are trivial to emulate in userland for
> compatibility.  Arrays of composites IIRC only go back to 8.3 so  that
> would be a true stopper for any solution in that vein.

Ok, tastes are tastes: I hate to make two or three more levels of
subqueries. Regarding arrays of composites, that would be perfectly
solved if we use no composite at all! Instead of a field with an array
of a composite of three instrinsics, use three fields, each of an
intrinsic type. See your proposal:

>> create type audit_field_t as (field text, old_value text, new_value text);

Instad, in the audit table you may use:

..., field smallint[], before text[], after text[],...

Note the smallint in field, that means I really want to keep the
reference to the "field" table. That is for the two reasons I had
mentioned earlier (to reduce space: 2 bytes of type "smallint" against
variable size of type "text"; and to keep track of names been used
too). You can also set up something like this if you like dimensions:

..., field smallint[], values text[][],...

Implying that the first dimension is the "before" value and the second
one is for the "after" value. Any of these prevents us from using
composites and makes the box a little wider and simpler. Even further,
I would like to keep the logging "on demand":

..., field smallint[], is_pk boolean[], { before text[], after text[]
| values text[][] },...

You know what are the braces and pipe for...
So, at the end, we have the entire "audet" table inside the "audit"
table, as a series of arrays. We got a real compact table with only
enough data to fully log the changes which triggered the event. No
less, no more.
At this point we know querying this table will be much more slow and
rotation will have to be done more frequently. If we dump>restore the
table somewhere else we will still be able to split the table in the
original two ones, and make indexes, cluster them, and query as
desired. But this can get so complicated that maybe I should implement
a function doing all this. In an event, we are getting less
responsiveness because of this. But a couple of mins more may not be a
problem for most cases. I'm just trying to summarize.

As a rule of thumb, you may need to run a cron job every night or so
to check if 'select count(*) from audit' is bigger than X then rotate
the table (or maybe each X days/weeks/etc.). The smaller the X, the
bigger responsiveness _in_ some cases: if we know an interval in time
we will just have to dump>restore those logs. In other cases this
would not be of much help: if you need to track a tupple to the very
beggining of the times, you'll have a lot of work to do
dumping>restoring (and so forth... remember to split the table,
indexing...). Still, rotation seems to be a good practice, and you can
include in the cron job the dump/restore part into another server and
then delete the old table. That would save a lot of space in your
production environment.

> As for the rest of it, I'd be looking to try and come up with an all
> sql implementation.  Also you should give an honest comparison between
> what you've come up with vs. this:
> http://pgfoundry.org/projects/tablelog/.
>
> merlin
>

"All SQL implementation"? Didn't we agree that's not possible in
pg<=8.4? then what do you mean by that?

About project "tablelog", I didn't really try it, but read it's
documentation and seems not appropiate at all for my case. First of
all, it's propose seems to be to log everything in a table to be able
to restore it later as of any time in the past. My propose is to log
to run analysis. Also, it needs to create one table per logged table,
consisting of the same structure of the logged table (without
constraints) plus three, four or five columns for control (depending
on usage, four or five recommended). I have a lot of tables to log
(hundreds!) with small changes to each of them; that means to
duplicate the amount of tables for a few changes. Speaking of
compactness... It also logs everything, not only changed values.
It is written in C, so I assume it runs much, much faster (specially
needed for highly transactional DBs). But it's not proven to be binary
safe (which I don't remember what that is). Bugs: nothing known.

So, if you need to be able to restore your table as of any time, use
tablelog. If you need to run analysis on who did what, use my option.


Finally attaching the code!

Cheers.

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

Attachment

audit.sql

Re: [Solved] Generic logging system for pre-hstore using plperl triggers

From

Diego Augusto Molina

Date:

03 October 2011, 11:38:26

Some errata and round up comments.

First of all, I think it's relevant to give some info about the system
I'm running:
  OS: Linux
  Distro: Gentoo
  Perl: dev-lang/perl-5.12.3-r1 USE="-* berkdb gdbm"
  Gentoo Base System release: 2.0.2
  PORTDIR/metadata/timestamp.chk: Tue, 12 Jul 2011 00:45:01 +0000

2011/10/2, Diego Augusto Molina <diegoaugustomolina@gmail.com>:
> solved if we use no composite at all! Instead of a field with an array
> of a composite of three instrinsics, use three fields, each of an
> intrinsic type. See your proposal:

Errata: "use three fields, each *one being an array of an intrinsic type.*"

> mentioned earlier (to reduce space: 2 bytes of type "smallint" against
> variable size of type "text"; and to keep track of names been used
> too). You can also set up something like this if you like dimensions:

Errata: "and to keep track of names *being* used too)"

> Implying that the first dimension is the "before" value and the second
> one is for the "after" value. Any of these prevents us from using
> composites and makes the box a little wider and simpler. Even further,
> I would like to keep the logging "on demand":

Comment on <<I would like to keep the logging "on demand":>>
With this I mean to log only those columns wich are part of the
primary key or whose values have changed

--
Diego Augusto Molina
diegoaugustomolina@gmail.com

ES: Por favor, evite adjuntar documentos de Microsoft Office. Serán
desestimados.
EN: Please, avoid attaching Microsoft Office documents. They shall be discarded.
LINK: http://www.gnu.org/philosophy/no-word-attachments.html

a dumb question regarding RULES

From

Rafal Pietrak

Date:

10 October 2011, 11:35:28

Hi all,

I've been using RULES for some time now, and despite the fact, that I
always had to make "experiments" to have the expected results, it worked
for me just fine.

Now I have this simple scenario:
---------------------------------------------------
mbr2=# CREATE TEMP TABLE test(a int, b date);
CREATE TABLE
mbr2=# INSERT INTO test (a) values (22);
INSERT 0 1
mbr2=# CREATE TEMP VIEW testing AS SELECT * from test;
CREATE VIEW
mbr2=# SELECT * from testing;
 a  | b
----+---
 22 |
(1 row)


mbr2=# CREATE RULE chg_testing_1 AS ON UPDATE TO testing where new.b is
not null do instead select '+++';
CREATE RULE
mbr2=# CREATE RULE chg_testing_2 AS ON UPDATE TO testing  do instead
select '===';
CREATE RULE
mbr2=# UPDATE testing set a=44;
 ?column?
----------
 ===
(1 row)

UPDATE 0
mbr2=# UPDATE testing set a=44, b=current_date;
 ?column?
----------
 ===
(1 row)

UPDATE 0
mbr2=#
------------------------------------------

And I'm completely lost how to interpret the outcome:

1. I know, that RULEs countrary to TRIGGERs "rewrite" query, resulting
in actual execution of a different query from originally scheduled

2. while triggers "take" rows of processed data "and do their job
there".

Now, the above example does not behave as I would expect.

I really don't understand why both of the above updates give the same
result here and it boils down to: how come a RULE can have a look at
"NEW.B", if it's supposed to be rewritten *before* any rows are fetched
for processing? When exactly does it "have a look" at the NEW.B? How do
I ensure, that the "DO INSTEAD NOTHING" rule *does*not* get executted
when a conditional rule does?

Can someone give a little explenation here ... or point me to "for
dummies" documentation? (yes, I've been though postgres docs, to no
avail).

I'll apreciate that, really.

-R

Re: a dumb question regarding RULES

From

hubert depesz lubaczewski

Date:

10 October 2011, 11:48:17

On Mon, Oct 10, 2011 at 04:06:34PM +0200, Rafal Pietrak wrote:
> Hi all,

first of all - why did you send this mail as reply to some 2-weeks old
thread, instead of just start of new thread?

> Can someone give a little explenation here ... or point me to "for
> dummies" documentation? (yes, I've been though postgres docs, to no
> avail).

perhaps this will help:

http://www.depesz.com/index.php/2010/06/15/to-rule-or-not-to-rule-that-is-the-question/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: a dumb question regarding RULES

From

Rafal Pietrak

Date:

10 October 2011, 18:03:43

On Mon, 2011-10-10 at 16:48 +0200, hubert depesz lubaczewski wrote:
> On Mon, Oct 10, 2011 at 04:06:34PM +0200, Rafal Pietrak wrote:
> > Hi all,
>
> first of all - why did you send this mail as reply to some 2-weeks old
> thread, instead of just start of new thread?

Sorry for that. Old habits... always forget that.

>
> > Can someone give a little explenation here ... or point me to "for
> > dummies" documentation? (yes, I've been though postgres docs, to no
> > avail).
>
> perhaps this will help:
>
> http://www.depesz.com/index.php/2010/06/15/to-rule-or-not-to-rule-that-is-the-question/
>

Hmmm. not really. Yet, more puzzles are there, so may be they'll guide
me to the answer, eventualy.

One thing I've spotted there, is that earlier I've naively assumed, that
when I define a RULE INSTEAD, the original query is "discarded" on the
final rewritten query. The example found at your link shows that it
isn't ... which is *extremally* strange, but somehow explains what I get
in my set of rules "supposedly" exclusive.

... or may be this "theory" is also wrong :(

Anyway, thenx for the link.

-R