Thread: transition tables and UPDATE
Earlier today I gave a talk about MERGE and wanted to provide an example with FOR EACH STATEMENT triggers using transition tables. However, I can't find a non-ugly way to obtain the NEW row that corresponds to each OLD row ... I had to resort to an ugly trick with OFFSET n LIMIT 1. Can anyone suggest anything better? I couldn't find any guidance in the docs. This is the example function I wrote: CREATE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF (TG_OP = 'DELETE') THEN INSERT INTO wine_audit SELECT 'D', now(), row_to_json(o), NULL FROM old_table o; ELSIF (TG_OP = 'INSERT') THEN INSERT INTO wine_audit SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n; ELSIF (TG_OP = 'UPDATE') THEN DECLARE oldrec record; newrec jsonb; i integer := 0; BEGIN FOR oldrec IN SELECT * FROM old_table LOOP newrec := row_to_json(n) FROM new_table n OFFSET i LIMIT 1; i := i + 1; INSERT INTO wine_audit SELECT 'U', now(), row_to_json(oldrec), newrec; END LOOP; END; END IF; RETURN NULL; END; $$; CREATE TABLE wines (winery text, brand text, variety text, year int, bottles int); CREATE TABLE shipment (LIKE wines); CREATE TABLE wine_audit (op varchar(1), datetime timestamptz, oldrow jsonb, newrow jsonb); CREATE TRIGGER wine_update AFTER UPDATE ON wines REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table FOR EACH STATEMENT EXECUTE FUNCTION wine_audit(); -- I omit triggers on insert and update because the trigger code for those is trivial INSERT INTO wines VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 12), ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 12); INSERT INTO shipment VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 96), ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 120), ('Concha y Toro', 'Marqués de Casa y Concha', 'Carmenere', 2021, 48), ('Concha y Toro', 'Casillero del Diablo', 'Cabernet Sauvignon', 2019, 240); ALTER TABLE shipment ADD COLUMN marked timestamp with time zone; WITH unmarked_shipment AS (UPDATE shipment SET marked = now() WHERE marked IS NULL RETURNING winery, brand, variety, year, bottles) MERGE INTO wines AS w USING (SELECT winery, brand, variety, year, sum(bottles) as bottles FROM unmarked_shipment GROUP BY winery, brand, variety, year) AS s ON (w.winery, w.brand, w.variety, w.year) = (s.winery, s.brand, s.variety, s.year) WHEN MATCHED THEN UPDATE SET bottles = w.bottles + s.bottles WHEN NOT MATCHED THEN INSERT (winery, brand, variety, year, bottles) VALUES (s.winery, s.brand, s.variety, s.year, s.bottles) ; If you examine table wine_audit after pasting all of the above, you'll see this, which is correct: ─[ RECORD 1 ]──────────────────────────────────────────────────────────────────────────────────────────────────── op │ U datetime │ 2023-02-01 01:16:44.704036+01 oldrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"} newrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 108, "variety": "Chardonnay"} ─[ RECORD 2 ]──────────────────────────────────────────────────────────────────────────────────────────────────── op │ U datetime │ 2023-02-01 01:16:44.704036+01 oldrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"} newrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 132, "variety": "Merlot"} My question is how to obtain the same rows without the LIMIT/OFFSET line in the trigger function. Also: how can we "subtract" both JSON blobs so that the 'newrow' only contains the members that differ? I would like to have this: ─[ RECORD 1 ]──────────────────────────────────────────────────────────────────────────────────────────────────── op │ U datetime │ 2023-02-01 01:16:44.704036+01 oldrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"} newrow │ {"bottles": 108} ─[ RECORD 2 ]──────────────────────────────────────────────────────────────────────────────────────────────────── op │ U datetime │ 2023-02-01 01:16:44.704036+01 oldrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"} newrow │ {"bottles": 132} -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"
On Wed, Feb 1, 2023 at 10:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Earlier today I gave a talk about MERGE and wanted to provide an example > with FOR EACH STATEMENT triggers using transition tables. However, I > can't find a non-ugly way to obtain the NEW row that corresponds to each > OLD row ... I had to resort to an ugly trick with OFFSET n LIMIT 1. > Can anyone suggest anything better? I couldn't find any guidance in the > docs. I don't know the answer, either in PostgreSQL or the SQL spec. I wondered if there *should* be a way here: https://www.postgresql.org/message-id/CAEepm=1ncxBNna-pXGr2hnMHRyYi_6_AwG_352-Jn=mwdFdAGw@mail.gmail.com
On 2023-Feb-01, Thomas Munro wrote: > On Wed, Feb 1, 2023 at 10:18 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Earlier today I gave a talk about MERGE and wanted to provide an example > > with FOR EACH STATEMENT triggers using transition tables. However, I > > can't find a non-ugly way to obtain the NEW row that corresponds to each > > OLD row ... I had to resort to an ugly trick with OFFSET n LIMIT 1. > > Can anyone suggest anything better? I couldn't find any guidance in the > > docs. > > I don't know the answer, either in PostgreSQL or the SQL spec. I > wondered if there *should* be a way here: > > https://www.postgresql.org/message-id/CAEepm=1ncxBNna-pXGr2hnMHRyYi_6_AwG_352-Jn=mwdFdAGw@mail.gmail.com I had tried to tie these relations using WITH ORDINALITY, but the only way I could think of (array_agg to then unnest() WITH ORDINALITY) was even uglier than what I already had. So yeah, I think it might be useful if we had a way to inject a counter or something in there. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The important things in the world are problems with society that we don't understand at all. The machines will become more complicated but they won't be more complicated than the societies that run them." (Freeman Dyson)
On Wed, 1 Feb 2023 at 12:12, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I had tried to tie these relations using WITH ORDINALITY, but the only > way I could think of (array_agg to then unnest() WITH ORDINALITY) was > even uglier than what I already had. So yeah, I think it might be > useful if we had a way to inject a counter or something in there. > You could use a pair of cursors like this: CREATE OR REPLACE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF (TG_OP = 'DELETE') THEN INSERT INTO wine_audit SELECT 'D', now(), row_to_json(o), NULL FROM old_table o; ELSIF (TG_OP = 'INSERT') THEN INSERT INTO wine_audit SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n; ELSIF (TG_OP = 'UPDATE') THEN DECLARE oldcur CURSOR FOR SELECT row_to_json(o) FROM old_table o; newcur CURSOR FOR SELECT row_to_json(n) FROM new_table n; oldrec jsonb; newrec jsonb; BEGIN OPEN oldcur; OPEN newcur; LOOP FETCH oldcur INTO oldrec; EXIT WHEN NOT FOUND; FETCH newcur INTO newrec; EXIT WHEN NOT FOUND; INSERT INTO wine_audit VALUES('U', now(), oldrec, newrec); END LOOP; CLOSE oldcur; CLOSE newcur; END; END IF; RETURN NULL; END; $$; though it would be nicer if there was a way to do it in a single SQL statement. Regards, Dean
On Wed, 1 Feb 2023 10:03:26 +0100 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Earlier today I gave a talk about MERGE and wanted to provide an example > with FOR EACH STATEMENT triggers using transition tables. However, I > can't find a non-ugly way to obtain the NEW row that corresponds to each > OLD row ... I had to resort to an ugly trick with OFFSET n LIMIT 1. > Can anyone suggest anything better? I couldn't find any guidance in the > docs. What I could come up with is to join old_table and new_table using keys of the wine table (winery, brand, variety, year), or join them using values from row_number(), like; INSERT INTO wine_audit SELECT 'U', now(), row_o, row_n FROM (SELECT row_number() OVER() i, row_to_json(o) row_o FROM old_table o) JOIN (SELECT row_number() OVER() i, row_to_json(n) row_n FROM new_table n) USING (i); > > This is the example function I wrote: > > CREATE FUNCTION wine_audit() RETURNS trigger LANGUAGE plpgsql AS $$ > BEGIN > IF (TG_OP = 'DELETE') THEN > INSERT INTO wine_audit > SELECT 'D', now(), row_to_json(o), NULL FROM old_table o; > ELSIF (TG_OP = 'INSERT') THEN > INSERT INTO wine_audit > SELECT 'I', now(), NULL, row_to_json(n) FROM new_table n; > ELSIF (TG_OP = 'UPDATE') THEN > DECLARE > oldrec record; > newrec jsonb; > i integer := 0; > BEGIN > FOR oldrec IN SELECT * FROM old_table LOOP > newrec := row_to_json(n) FROM new_table n OFFSET i LIMIT 1; > i := i + 1; > INSERT INTO wine_audit > SELECT 'U', now(), row_to_json(oldrec), newrec; > END LOOP; > END; > > END IF; > RETURN NULL; > END; > $$; > > CREATE TABLE wines (winery text, brand text, variety text, year int, bottles int); > CREATE TABLE shipment (LIKE wines); > CREATE TABLE wine_audit (op varchar(1), datetime timestamptz, > oldrow jsonb, newrow jsonb); > > CREATE TRIGGER wine_update > AFTER UPDATE ON wines > REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table > FOR EACH STATEMENT EXECUTE FUNCTION wine_audit(); > -- I omit triggers on insert and update because the trigger code for those is trivial > > INSERT INTO wines VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 12), > ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 12); > > INSERT INTO shipment VALUES ('Concha y Toro', 'Sunrise', 'Chardonnay', 2021, 96), > ('Concha y Toro', 'Sunrise', 'Merlot', 2022, 120), > ('Concha y Toro', 'Marqués de Casa y Concha', 'Carmenere', 2021, 48), > ('Concha y Toro', 'Casillero del Diablo', 'Cabernet Sauvignon', 2019, 240); > > ALTER TABLE shipment ADD COLUMN marked timestamp with time zone; > > WITH unmarked_shipment AS > (UPDATE shipment SET marked = now() WHERE marked IS NULL > RETURNING winery, brand, variety, year, bottles) > MERGE INTO wines AS w > USING (SELECT winery, brand, variety, year, > sum(bottles) as bottles > FROM unmarked_shipment > GROUP BY winery, brand, variety, year) AS s > ON (w.winery, w.brand, w.variety, w.year) = > (s.winery, s.brand, s.variety, s.year) > WHEN MATCHED THEN > UPDATE SET bottles = w.bottles + s.bottles > WHEN NOT MATCHED THEN > INSERT (winery, brand, variety, year, bottles) > VALUES (s.winery, s.brand, s.variety, s.year, s.bottles) > ; > > > If you examine table wine_audit after pasting all of the above, you'll > see this, which is correct: > > ─[ RECORD 1 ]──────────────────────────────────────────────────────────────────────────────────────────────────── > op │ U > datetime │ 2023-02-01 01:16:44.704036+01 > oldrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"} > newrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 108, "variety": "Chardonnay"} > ─[ RECORD 2 ]──────────────────────────────────────────────────────────────────────────────────────────────────── > op │ U > datetime │ 2023-02-01 01:16:44.704036+01 > oldrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"} > newrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 132, "variety": "Merlot"} > > My question is how to obtain the same rows without the LIMIT/OFFSET line > in the trigger function. > > > Also: how can we "subtract" both JSON blobs so that the 'newrow' only > contains the members that differ? I would like to have this: > > ─[ RECORD 1 ]──────────────────────────────────────────────────────────────────────────────────────────────────── > op │ U > datetime │ 2023-02-01 01:16:44.704036+01 > oldrow │ {"year": 2021, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Chardonnay"} > newrow │ {"bottles": 108} > ─[ RECORD 2 ]──────────────────────────────────────────────────────────────────────────────────────────────────── > op │ U > datetime │ 2023-02-01 01:16:44.704036+01 > oldrow │ {"year": 2022, "brand": "Sunrise", "winery": "Concha y Toro", "bottles": 12, "variety": "Merlot"} > newrow │ {"bottles": 132} > > -- > Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ > "La gente vulgar sólo piensa en pasar el tiempo; > el que tiene talento, en aprovecharlo" > > -- Yugo NAGATA <nagata@sraoss.co.jp>
even uglier than what I already had. So yeah, I think it might be
useful if we had a way to inject a counter or something in there.
This came up for me when I was experimenting with making the referential integrity triggers fire on statements rather than rows. Doing so has the potential to take a lot of the sting out of big deletes where the referencing column isn't indexed (thus resulting in N sequentials scans of the referencing table). If that were 1 statement then we'd get a single (still ugly) hash join, but it's an improvement.
It has been suggested that the the overhead of forming the tuplestores of affected rows and reconstituting them into EphemerialNamedRelations could be made better by instead storing an array of old ctids and new ctids, which obviously would be in the same order, if we had a means of reconstituting those with just the columns needed for the check (and generating a fake ordinality column for your needs), that would be considerably lighter weight than the tuplestores, and it might make statement level triggers more useful all around.
It has been suggested that the the overhead of forming the tuplestores of affected rows and reconstituting them into EphemerialNamedRelations could be made better by instead storing an array of old ctids and new ctids, which obviously would be in the same order, if we had a means of reconstituting those with just the columns needed for the check (and generating a fake ordinality column for your needs), that would be considerably lighter weight than the tuplestores, and it might make statement level triggers more useful all around.