Thread: transition tables and UPDATE

transition tables and UPDATE

From
Alvaro Herrera
Date:
Earlier today I gave a talk about MERGE and wanted to provide an example
with FOR EACH STATEMENT triggers using transition tables.  However, I
can't find a non-ugly way to obtain the NEW row that corresponds to each
OLD row ...  I had to resort to an ugly trick with OFFSET n LIMIT 1.
Can anyone suggest anything better?  I couldn't find any guidance in the
docs.

This is the example function I wrote:

CREATE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      INSERT INTO wine_audit
           SELECT 'D', now(), row_to_json(o), NULL FROM old_table o;
    ELSIF (TG_OP = 'INSERT') THEN
      INSERT INTO wine_audit
           SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n;
    ELSIF (TG_OP = 'UPDATE') THEN
      DECLARE
        oldrec record;
    newrec jsonb;
    i      integer := 0;
      BEGIN
        FOR oldrec IN SELECT * FROM old_table LOOP
          newrec := row_to_json(n) FROM new_table n OFFSET i LIMIT 1;
          i := i + 1;
          INSERT INTO wine_audit
               SELECT 'U', now(), row_to_json(oldrec), newrec;
        END LOOP;
      END;

    END IF;
    RETURN NULL;
  END;
$$;

CREATE TABLE wines (winery text, brand text, variety text, year int, bottles int);
CREATE TABLE shipment (LIKE wines);
CREATE TABLE wine_audit (op varchar(1), datetime timestamptz,
             oldrow jsonb, newrow jsonb);

CREATE TRIGGER wine_update
  AFTER UPDATE ON wines
  REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
  FOR EACH STATEMENT EXECUTE FUNCTION wine_audit();
-- I omit triggers on insert and update because the trigger code for those is trivial

INSERT INTO wines VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 12),
('Concha y Toro', 'Sunrise', 'Merlot', 2022, 12);

INSERT INTO shipment VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 96),
('Concha y Toro', 'Sunrise', 'Merlot', 2022, 120),
('Concha y Toro', 'Marqués de Casa y Concha', 'Carmenere', 2021, 48),
('Concha y Toro', 'Casillero del Diablo', 'Cabernet Sauvignon', 2019, 240);

ALTER TABLE shipment ADD COLUMN marked timestamp with time zone;

WITH unmarked_shipment AS
 (UPDATE shipment SET marked = now() WHERE marked IS NULL
         RETURNING winery, brand, variety, year, bottles)
MERGE INTO wines AS w
     USING (SELECT winery, brand, variety, year,
                        sum(bottles) as bottles
                   FROM unmarked_shipment
               GROUP BY winery, brand, variety, year) AS s
        ON (w.winery, w.brand, w.variety, w.year) =
           (s.winery, s.brand, s.variety, s.year)
WHEN MATCHED THEN
     UPDATE SET bottles = w.bottles + s.bottles
WHEN NOT MATCHED THEN
     INSERT (winery, brand, variety, year, bottles)
     VALUES (s.winery, s.brand, s.variety, s.year, s.bottles)
;


If you examine table wine_audit after pasting all of the above, you'll
see this, which is correct:

─[ RECORD 1 ]────────────────────────────────────────────────────────────────────────────────────────────────────
op       │ U
datetime │ 2023-02-01 01:16:44.704036+01
oldrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"}
newrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 108, "variety": "Chardonnay"}
─[ RECORD 2 ]────────────────────────────────────────────────────────────────────────────────────────────────────
op       │ U
datetime │ 2023-02-01 01:16:44.704036+01
oldrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"}
newrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 132, "variety": "Merlot"}

My question is how to obtain the same rows without the LIMIT/OFFSET line
in the trigger function.


Also: how can we "subtract" both JSON blobs so that the 'newrow' only
contains the members that differ?  I would like to have this:

─[ RECORD 1 ]────────────────────────────────────────────────────────────────────────────────────────────────────
op       │ U
datetime │ 2023-02-01 01:16:44.704036+01
oldrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"}
newrow   │ {"bottles": 108}
─[ RECORD 2 ]────────────────────────────────────────────────────────────────────────────────────────────────────
op       │ U
datetime │ 2023-02-01 01:16:44.704036+01
oldrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"}
newrow   │ {"bottles": 132}

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La gente vulgar sólo piensa en pasar el tiempo;
el que tiene talento, en aprovecharlo"



Re: transition tables and UPDATE

From
Thomas Munro
Date:
On Wed, Feb 1, 2023 at 10:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Earlier today I gave a talk about MERGE and wanted to provide an example
> with FOR EACH STATEMENT triggers using transition tables.  However, I
> can't find a non-ugly way to obtain the NEW row that corresponds to each
> OLD row ...  I had to resort to an ugly trick with OFFSET n LIMIT 1.
> Can anyone suggest anything better?  I couldn't find any guidance in the
> docs.

I don't know the answer, either in PostgreSQL or the SQL spec.  I
wondered if there *should* be a way here:

https://www.postgresql.org/message-id/CAEepm=1ncxBNna-pXGr2hnMHRyYi_6_AwG_352-Jn=mwdFdAGw@mail.gmail.com



Re: transition tables and UPDATE

From
Alvaro Herrera
Date:
On 2023-Feb-01, Thomas Munro wrote:

> On Wed, Feb 1, 2023 at 10:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > Earlier today I gave a talk about MERGE and wanted to provide an example
> > with FOR EACH STATEMENT triggers using transition tables.  However, I
> > can't find a non-ugly way to obtain the NEW row that corresponds to each
> > OLD row ...  I had to resort to an ugly trick with OFFSET n LIMIT 1.
> > Can anyone suggest anything better?  I couldn't find any guidance in the
> > docs.
> 
> I don't know the answer, either in PostgreSQL or the SQL spec.  I
> wondered if there *should* be a way here:
> 
> https://www.postgresql.org/message-id/CAEepm=1ncxBNna-pXGr2hnMHRyYi_6_AwG_352-Jn=mwdFdAGw@mail.gmail.com

I had tried to tie these relations using WITH ORDINALITY, but the only
way I could think of (array_agg to then unnest() WITH ORDINALITY) was
even uglier than what I already had.  So yeah, I think it might be
useful if we had a way to inject a counter or something in there.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The important things in the world are problems with society that we don't
understand at all. The machines will become more complicated but they won't
be more complicated than the societies that run them."    (Freeman Dyson)



Re: transition tables and UPDATE

From
Dean Rasheed
Date:
On Wed, 1 Feb 2023 at 12:12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> I had tried to tie these relations using WITH ORDINALITY, but the only
> way I could think of (array_agg to then unnest() WITH ORDINALITY) was
> even uglier than what I already had.  So yeah, I think it might be
> useful if we had a way to inject a counter or something in there.
>

You could use a pair of cursors like this:

CREATE OR REPLACE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$
  BEGIN
    IF (TG_OP = 'DELETE') THEN
      INSERT INTO wine_audit
           SELECT 'D', now(), row_to_json(o), NULL FROM old_table o;
    ELSIF (TG_OP = 'INSERT') THEN
      INSERT INTO wine_audit
           SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n;
    ELSIF (TG_OP = 'UPDATE') THEN
      DECLARE
        oldcur CURSOR FOR SELECT row_to_json(o) FROM old_table o;
        newcur CURSOR FOR SELECT row_to_json(n) FROM new_table n;
        oldrec jsonb;
        newrec jsonb;
      BEGIN
        OPEN oldcur;
        OPEN newcur;

        LOOP
          FETCH oldcur INTO oldrec;
          EXIT WHEN NOT FOUND;

          FETCH newcur INTO newrec;
          EXIT WHEN NOT FOUND;

          INSERT INTO wine_audit VALUES('U', now(), oldrec, newrec);
        END LOOP;

        CLOSE oldcur;
        CLOSE newcur;
      END;

    END IF;
    RETURN NULL;
  END;
$$;

though it would be nicer if there was a way to do it in a single SQL statement.

Regards,
Dean



Re: transition tables and UPDATE

From
Yugo NAGATA
Date:
On Wed, 1 Feb 2023 10:03:26 +0100
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> Earlier today I gave a talk about MERGE and wanted to provide an example
> with FOR EACH STATEMENT triggers using transition tables.  However, I
> can't find a non-ugly way to obtain the NEW row that corresponds to each
> OLD row ...  I had to resort to an ugly trick with OFFSET n LIMIT 1.
> Can anyone suggest anything better?  I couldn't find any guidance in the
> docs.

What I could come up with is to join old_table and new_table using keys
of the wine table (winery, brand, variety, year), or join them using
values from row_number(), like;

 INSERT INTO wine_audit
  SELECT 'U', now(), row_o, row_n
   FROM (SELECT row_number() OVER() i, row_to_json(o) row_o FROM old_table o) 
   JOIN (SELECT row_number() OVER() i, row_to_json(n) row_n FROM new_table n) 
     USING (i);

> 
> This is the example function I wrote:
> 
> CREATE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$
>   BEGIN
>     IF (TG_OP = 'DELETE') THEN
>       INSERT INTO wine_audit
>            SELECT 'D', now(), row_to_json(o), NULL FROM old_table o;
>     ELSIF (TG_OP = 'INSERT') THEN
>       INSERT INTO wine_audit
>            SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n;
>     ELSIF (TG_OP = 'UPDATE') THEN
>       DECLARE
>         oldrec record;
>     newrec jsonb;
>     i      integer := 0;
>       BEGIN
>         FOR oldrec IN SELECT * FROM old_table LOOP
>           newrec := row_to_json(n) FROM new_table n OFFSET i LIMIT 1;
>           i := i + 1;
>           INSERT INTO wine_audit
>                SELECT 'U', now(), row_to_json(oldrec), newrec;
>         END LOOP;
>       END;
> 
>     END IF;
>     RETURN NULL;
>   END;
> $$;
> 
> CREATE TABLE wines (winery text, brand text, variety text, year int, bottles int);
> CREATE TABLE shipment (LIKE wines);
> CREATE TABLE wine_audit (op varchar(1), datetime timestamptz,
>              oldrow jsonb, newrow jsonb);
> 
> CREATE TRIGGER wine_update
>   AFTER UPDATE ON wines
>   REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
>   FOR EACH STATEMENT EXECUTE FUNCTION wine_audit();
> -- I omit triggers on insert and update because the trigger code for those is trivial
> 
> INSERT INTO wines VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 12),
> ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 12);
> 
> INSERT INTO shipment VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 96),
> ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 120),
> ('Concha y Toro', 'Marqués de Casa y Concha', 'Carmenere', 2021, 48),
> ('Concha y Toro', 'Casillero del Diablo', 'Cabernet Sauvignon', 2019, 240);
> 
> ALTER TABLE shipment ADD COLUMN marked timestamp with time zone;
> 
> WITH unmarked_shipment AS
>  (UPDATE shipment SET marked = now() WHERE marked IS NULL
>          RETURNING winery, brand, variety, year, bottles)
> MERGE INTO wines AS w
>      USING (SELECT winery, brand, variety, year,
>                         sum(bottles) as bottles
>                    FROM unmarked_shipment
>                GROUP BY winery, brand, variety, year) AS s
>         ON (w.winery, w.brand, w.variety, w.year) =
>            (s.winery, s.brand, s.variety, s.year)
> WHEN MATCHED THEN
>      UPDATE SET bottles = w.bottles + s.bottles
> WHEN NOT MATCHED THEN
>      INSERT (winery, brand, variety, year, bottles)
>      VALUES (s.winery, s.brand, s.variety, s.year, s.bottles)
> ;
> 
> 
> If you examine table wine_audit after pasting all of the above, you'll
> see this, which is correct:
> 
> ─[ RECORD 1 ]────────────────────────────────────────────────────────────────────────────────────────────────────
> op       │ U
> datetime │ 2023-02-01 01:16:44.704036+01
> oldrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"}
> newrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 108, "variety": "Chardonnay"}
> ─[ RECORD 2 ]────────────────────────────────────────────────────────────────────────────────────────────────────
> op       │ U
> datetime │ 2023-02-01 01:16:44.704036+01
> oldrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"}
> newrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 132, "variety": "Merlot"}
> 
> My question is how to obtain the same rows without the LIMIT/OFFSET line
> in the trigger function.
> 
> 
> Also: how can we "subtract" both JSON blobs so that the 'newrow' only
> contains the members that differ?  I would like to have this:
> 
> ─[ RECORD 1 ]────────────────────────────────────────────────────────────────────────────────────────────────────
> op       │ U
> datetime │ 2023-02-01 01:16:44.704036+01
> oldrow   │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"}
> newrow   │ {"bottles": 108}
> ─[ RECORD 2 ]────────────────────────────────────────────────────────────────────────────────────────────────────
> op       │ U
> datetime │ 2023-02-01 01:16:44.704036+01
> oldrow   │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"}
> newrow   │ {"bottles": 132}
> 
> -- 
> Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
> "La gente vulgar sólo piensa en pasar el tiempo;
> el que tiene talento, en aprovecharlo"
> 
> 


-- 
Yugo NAGATA <nagata@sraoss.co.jp>



Re: transition tables and UPDATE

From
Corey Huinker
Date:

even uglier than what I already had.  So yeah, I think it might be
useful if we had a way to inject a counter or something in there.


This came up for me when I was experimenting with making the referential integrity triggers fire on statements rather than rows. Doing so has the potential to take a lot of the sting out of big deletes where the referencing column isn't indexed (thus resulting in N sequentials scans of the referencing table). If that were 1 statement then we'd get a single (still ugly) hash join, but it's an improvement.

It has been suggested that the the overhead of forming the tuplestores of affected rows and reconstituting them into EphemerialNamedRelations could be made better by instead storing an array of old ctids and new ctids, which obviously would be in the same order, if we had a means of reconstituting those with just the columns needed for the check (and generating a fake ordinality column for your needs), that would be considerably lighter weight than the tuplestores, and it might make statement level triggers more useful all around.