Thread: filtering columns in function

filtering columns in function

From
David Ford
Date:
i have several APIs which operate over several different tables. first allow me to show you some plpgsql i use, this is semantically the same in all of my APIs with different names for tables (this is python generating the SQL statements):

            _proc = '''
                CREATE OR REPLACE FUNCTION notify_proc() RETURNS trigger AS $$
                DECLARE
                    _json    json;
                    _record  record;
                BEGIN
                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, NEW.*
                        INTO    _record;
                    ELSE
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, OLD.*
                        INTO    _record;
                    END IF;

                    _json = row_to_json(_record);
                    PERFORM pg_notify(CAST('dfw' AS text), CAST(_json AS text));

                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        RETURN NEW;
                    ELSE
                        RETURN OLD;
                    END IF;

                END;
                $$ LANGUAGE plpgsql;
                '''

            _trig = '''
                DO
                $$
                BEGIN
                    IF NOT EXISTS (SELECT *
                        FROM  information_schema.triggers
                        WHERE event_object_table = '{table}'
                        AND   trigger_name = 'dfw_notify_{table}_{op}'
                    )
                    THEN
                        CREATE TRIGGER dfw_notify_{table}_{op} {when} {op}
                                    ON {table}
                          FOR EACH ROW
                               EXECUTE
                             PROCEDURE notify_proc();
                    END IF;
                END;
                $$
                '''

            c.execute(_proc)

            for table in {'blocklist','recents','filter_meta','filter_whitelist'}:
                for op,when in {'insert':'BEFORE','update':'AFTER','delete':'BEFORE'}.items():
                    c.execute(_trig.format(op=op, when=when, table=table))


because i use this approach frequently for a number of software packages, i'd like to improve on it. pg notifications have a fixed size limit and recently i'm encountering more and more notifications are lost because this size limit is reached -- PG will emit an error instead of performing the notification. some of my databases have dozens of different tables with all sorts of different columns that i use this trigger and function on.

with the background provided, here's the question; without making a complex series of table name if/else stanzas, how can i more succinctly change my select query to omit certain columns?

                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, NEW.*
                        INTO    _record;
                    ELSE
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, OLD.*
                        INTO    _record;
                    END IF;

in other words, instead of NEW.* or OLD.*, i'd like to do NEW.* EXCLUDING ('somecol1','somecol2')

i've read a lot of dialog debating this over the last decade's worth of pgsql lists but i've not come up with any result which really fits into my usage with ease. i don't want to simply truncate the generated json because i can't guarantee that the columns i really do want will appear in the non-truncated segment. omitting certain columns will fix my overflow problems.

-david

Re: filtering columns in function

From
"David G. Johnston"
Date:
On Tue, Jan 19, 2016 at 12:59 PM, David Ford <firefighterblu3@gmail.com> wrote:
i have several APIs which operate over several different tables. first allow me to show you some plpgsql i use, this is semantically the same in all of my APIs with different names for tables (this is python generating the SQL statements):

            _proc = '''
                CREATE OR REPLACE FUNCTION notify_proc() RETURNS trigger AS $$
                DECLARE
                    _json    json;
                    _record  record;
                BEGIN
                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, NEW.*
                        INTO    _record;
                    ELSE
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, OLD.*
                        INTO    _record;
                    END IF;

                    _json = row_to_json(_record);
                    PERFORM pg_notify(CAST('dfw' AS text), CAST(_json AS text));

                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        RETURN NEW;
                    ELSE
                        RETURN OLD;
                    END IF;

                END;
                $$ LANGUAGE plpgsql;
                '''

            _trig = '''
                DO
                $$
                BEGIN
                    IF NOT EXISTS (SELECT *
                        FROM  information_schema.triggers
                        WHERE event_object_table = '{table}'
                        AND   trigger_name = 'dfw_notify_{table}_{op}'
                    )
                    THEN
                        CREATE TRIGGER dfw_notify_{table}_{op} {when} {op}
                                    ON {table}
                          FOR EACH ROW
                               EXECUTE
                             PROCEDURE notify_proc();
                    END IF;
                END;
                $$
                '''

            c.execute(_proc)

            for table in {'blocklist','recents','filter_meta','filter_whitelist'}:
                for op,when in {'insert':'BEFORE','update':'AFTER','delete':'BEFORE'}.items():
                    c.execute(_trig.format(op=op, when=when, table=table))


because i use this approach frequently for a number of software packages, i'd like to improve on it. pg notifications have a fixed size limit and recently i'm encountering more and more notifications are lost because this size limit is reached -- PG will emit an error instead of performing the notification. some of my databases have dozens of different tables with all sorts of different columns that i use this trigger and function on.

​Notifications generally should contain just enough information in the payload so that the caller can accurately retrieve the​, possibly substantial, data from some known location.


with the background provided, here's the question; without making a complex series of table name if/else stanzas, how can i more succinctly change my select query to omit certain columns?

​I would either leverage the catalog by including a semantic COMMENT on the relevant column OR introduce your own table that maintains said information.  Or, and probably faster, create a custom trigger function for each table using whatever manner of tools you desire to associate the meta-data about relevant columns to each one.

One thought would be to only include the relevant PK fields (already captured in the catalog) along with any timestamp (i.e., modified) fields you use for aiding versioning.
 

                    IF TG_OP = 'INSERT' or TG_OP = 'UPDATE' THEN
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, NEW.*
                        INTO    _record;
                    ELSE
                        SELECT TG_TABLE_NAME AS table, TG_OP AS action, OLD.*
                        INTO    _record;
                    END IF;

in other words, instead of NEW.* or OLD.*, i'd like to do NEW.* EXCLUDING ('somecol1','somecol2')


​I have, from time to time, wanted such a capability as well...but alas I'm doubtful it will ever be allowed as it goes against the spirit of SQL as presently designed.  I am certain how you think it would you, though...unless you just intend to name every field you don't care about regardless of whether it appears in the query.

​David J.