Re: Another Plpgsql trigger example - summary table - Mailing list pgsql-patches

From Mark Kirkwood
Subject Re: Another Plpgsql trigger example - summary table
Date
Msg-id 41E3339D.2020200@coretech.co.nz
Whole thread Raw
In response to Re: Another Plpgsql trigger example - summary table  (Mark Kirkwood <markir@coretech.co.nz>)
Responses Re: Another Plpgsql trigger example - summary table  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Another Plpgsql trigger example - summary table  (Mark Kirkwood <markir@coretech.co.nz>)
List pgsql-patches
Mark Kirkwood wrote:

>
> 2) Perhaps leave the trigger + plpgsql function as a plpgsql example,
> and refer to it in the (new) data warehouse section/chapter.
>
Looking at option 2, it seems reasonable to add a trimmed trigger
example into the plpgsql examples section now, and leave the data
warehouse introductory stuff for its own chapter at some later stage.

--- plpgsql.sgml.orig    Tue Jan 11 12:39:17 2005
+++ plpgsql.sgml    Tue Jan 11 13:43:35 2005
@@ -2646,6 +2646,162 @@
     FOR EACH ROW EXECUTE PROCEDURE process_emp_audit();
 </programlisting>
    </example>
+
+   <para>
+    An area where triggers can be useful is maintaining a summary table
+    of another table. The resulting summary can be used in place of the
+    original table for certain queries - with often vastly reduced run
+    times.
+   </para>
+
+   <para>
+    This technique is commonly used in Data Warehousing, where the tables
+    of measured or observed data (called fact tables) can be extremely large.
+    <xref linkend="plpgsql-trigger-summary-example"> shows an example of a
+    trigger procedure in <application>PL/pgSQL</application> that maintains
+    a summary table for a fact table in a data warehouse.
+   </para>
+
+
+   <example id="plpgsql-trigger-summary-example">
+    <title>A <application>PL/pgSQL</application> Trigger Procedure For Maintaining A Summary Table</title>
+
+    <para>
+     The schema detailed here is partly based on the <emphasis>Grocery Store
+     </emphasis> example from <emphasis>The Data Warehouse Toolkit</emphasis>
+     by Ralph Kimball.
+    </para>
+
+<programlisting>
+--
+-- Main tables - time dimension and sales fact.
+--
+CREATE TABLE time_dimension (
+    time_key                    integer NOT NULL,
+    day_of_week                 integer NOT NULL,
+    day_of_month                integer NOT NULL,
+    month                       integer NOT NULL,
+    quarter                     integer NOT NULL,
+    year                        integer NOT NULL
+);
+CREATE UNIQUE INDEX time_dimension_key ON time_dimension(time_key);
+
+CREATE TABLE sales_fact (
+    time_key                    integer NOT NULL,
+    product_key                 integer NOT NULL,
+    store_key                   integer NOT NULL,
+    amount_sold                 numeric(12,2) NOT NULL,
+    units_sold                  integer NOT NULL,
+    amount_cost                 numeric(12,2) NOT NULL
+);
+CREATE INDEX sales_fact_time ON sales_fact(time_key);
+
+--
+-- Summary table - sales by time.
+--
+CREATE TABLE sales_summary_bytime (
+    time_key                    integer NOT NULL,
+    amount_sold                 numeric(15,2) NOT NULL,
+    units_sold                  numeric(12) NOT NULL,
+    amount_cost                 numeric(15,2) NOT NULL
+);
+CREATE UNIQUE INDEX sales_summary_bytime_key ON sales_summary_bytime(time_key);
+
+--
+-- Function and trigger to amend summarized column(s) on UPDATE, INSERT, DELETE.
+--
+CREATE OR REPLACE FUNCTION maint_sales_summary_bytime() RETURNS TRIGGER AS $maint_sales_summary_bytime$
+    DECLARE
+        delta_time_key          integer;
+        delta_amount_sold       numeric(15,2);
+        delta_units_sold        numeric(12);
+        delta_amount_cost       numeric(15,2);
+    BEGIN
+
+        -- Work out the increment/decrement amount(s).
+        IF (TG_OP = 'DELETE') THEN
+
+            delta_time_key = OLD.time_key;
+            delta_amount_sold = -1 * OLD.amount_sold;
+            delta_units_sold = -1 * OLD.units_sold;
+            delta_amount_cost = -1 * OLD.amount_cost;
+
+        ELSIF (TG_OP = 'UPDATE') THEN
+
+            -- forbid updates that change the time_key -
+            -- (probably not too onerous, as DELETE + INSERT is how most
+            -- changes will be made).
+            IF ( OLD.time_key != NEW.time_key) THEN
+                RAISE EXCEPTION 'Update of time_key : % -> % not allowed', OLD.time_key, NEW.time_key;
+            END IF;
+
+            delta_time_key = OLD.time_key;
+            delta_amount_sold = NEW.amount_sold - OLD.amount_sold;
+            delta_units_sold = NEW.units_sold - OLD.units_sold;
+            delta_amount_cost = NEW.amount_cost - OLD.amount_cost;
+
+        ELSIF (TG_OP = 'INSERT') THEN
+
+            delta_time_key = NEW.time_key;
+            delta_amount_sold = NEW.amount_sold;
+            delta_units_sold = NEW.units_sold;
+            delta_amount_cost = NEW.amount_cost;
+
+        END IF;
+
+
+        -- Update the summary row with the new values.
+        UPDATE sales_summary_bytime
+            SET amount_sold = amount_sold + delta_amount_sold,
+                units_sold = units_sold + delta_units_sold,
+                amount_cost = amount_cost + delta_amount_cost
+            WHERE time_key = delta_time_key;
+
+
+        -- There might have been no row with this time_key (e.g new data!).
+        IF (NOT FOUND) THEN
+            BEGIN
+                INSERT INTO sales_summary_bytime (
+                            time_key,
+                            amount_sold,
+                            units_sold,
+                            amount_cost)
+                    SELECT  f.time_key,
+                            sum(f.amount_sold),
+                            sum(f.units_sold),
+                            sum(f.amount_cost)
+                    FROM sales_fact f
+                    WHERE f.time_key = delta_time_key
+                    GROUP BY f.time_key;
+                -- This query can potentially be very expensive if the trigger
+                -- is created on sales_fact without the time_key indexes.
+                -- Some care is needed to ensure that this situation does
+                -- *not* occur.
+            EXCEPTION
+                --
+                -- Catch race condition when two transactions are adding data
+                -- for a new time_key.
+                --
+                WHEN UNIQUE_VIOLATION THEN
+                    UPDATE sales_summary_bytime
+                        SET amount_sold = amount_sold + delta_amount_sold,
+                            units_sold = units_sold + delta_units_sold,
+                            amount_cost = amount_cost + delta_amount_cost
+                        WHERE time_key = delta_time_key;
+
+            END;
+        END IF;
+        RETURN NULL;
+
+    END;
+$maint_sales_summary_bytime$ LANGUAGE plpgsql;
+
+CREATE TRIGGER maint_sales_summary_bytime
+AFTER INSERT OR UPDATE OR DELETE ON sales_fact
+    FOR EACH ROW EXECUTE PROCEDURE maint_sales_summary_bytime();
+</programlisting>
+   </example>
+
   </sect1>

   <!-- **** Porting from Oracle PL/SQL **** -->

pgsql-patches by date:

Previous
From: Alvaro Herrera
Date:
Subject: Translation updates
Next
From: Brendan Jurd
Date:
Subject: Increased error verbosity when querying row-returning functions