Thread: Use ctid in where clause in update from statement
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
MERGE INTO test_large d
USING (SELECT ROWID, test_large.*
FROM test_large
WHERE grp = 1) s
ON (d.ROWID = s.ROWID)
WHEN MATCHED
THEN
UPDATE SET d.grp = s.grp;
It has the following execution plan:
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1000 | 16000 | 2006 (0)| 00:00:25 |
| 1 | MERGE | TEST_LARGE | | | | |
| 2 | VIEW | | | | | |
| 3 | NESTED LOOPS | | 1000 | 220K| 2006 (0)| 00:00:25 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST_LARGE | 1000 | 110K| 1006 (0)| 00:00:13 |
|* 5 | INDEX RANGE SCAN | IX_TL_GRP | 1000 | | 6 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY USER ROWID | TEST_LARGE | 1 | 113 | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
In the PostgreSQL version I use the column ctid for this. The above query becomes:
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
I noticed that the execution plan for such a statement seems to be suboptimal:
Update on test_large d (cost=155346.98..160367.08 rows=996 width=53)
-> Merge Join (cost=155346.98..160367.08 rows=996 width=53)
Merge Cond: (d.ctid = test_large.ctid)
-> Sort (cost=154330.57..156833.16 rows=1001033 width=43)
Sort Key: d.ctid
-> Seq Scan on test_large d (cost=0.00..19366.33 rows=1001033 width=43)
-> Sort (cost=1016.40..1018.89 rows=996 width=10)
Sort Key: test_large.ctid
-> Index Scan using ix_tl_grp on test_large (cost=0.42..966.80 rows=996 width=10)
Index Cond: (grp = 1)
I expected a tid scan to be used instead of a Seq scan. I did a VACUUM test_large to make sure statistics are valid.
I would suspect that a tid scan is even a bit faster than accessing via the primary key, since reading the index is not required.
I know I could change the WHERE clause to use the primary key, which results in the following plan:
Update on test_large d (cost=0.85..3344.12 rows=995 width=57)
-> Nested Loop (cost=0.85..3344.12 rows=995 width=57)
-> Index Scan using ix_tl_grp on test_large (cost=0.42..994.84 rows=995 width=14)
Index Cond: (grp = 1)
-> Index Scan using pk_test_large on test_large d (cost=0.42..2.36 rows=1 width=47)
Index Cond: (id = test_large.id)
So my question is more "Is a tid scan not possible / useful for an update" than "how do I accelerate this query".
BR
Dirk
-- Dirk Mika Software Developer |
|
fon +49 2202 2401-1197 |
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika |
Attachment
Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
BR
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Attachment
> I come from the Oracle world and am trying to convert some queries to > PostgreSQL syntax. One of these queries is a MERGE statement, which I > converted into an UPDATE SET FROM WHERE construct. In the original > query I use the pseudo column ROWID to match a source row with a > target row. > > In the PostgreSQL version I use the column ctid for this. The above query becomes: > > *UPDATE*test_large d > *SET* grp = s.grp > *FROM* (*SELECT* ctid, test_large.* > *FROM* test_large > *WHERE* grp = 1) s > *WHERE* d.ctid = s.ctid; Why don't you join on the primary key column? The ctid comparison is typically quite slow. Thomas
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.
BR
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Attachment
I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in PostgreSQLas well. Dirk -- Dirk Mika Software Developer mika:timing GmbH Strundepark - Kürtener Str. 11b 51465 Bergisch Gladbach Germany fon +49 2202 2401-1197 dirk.mika@mikatiming.de www.mikatiming.de AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika ## How2Use ## the ChampionChip by mika:timing ## https://youtu.be/qfOFXrpSKLQ Am 01.07.19, 11:36 schrieb "Thomas Kellerer" <spam_eater@gmx.net>: > I come from the Oracle world and am trying to convert some queries to > PostgreSQL syntax. One of these queries is a MERGE statement, which I > converted into an UPDATE SET FROM WHERE construct. In the original > query I use the pseudo column ROWID to match a source row with a > target row. > > In the PostgreSQL version I use the column ctid for this. The above query becomes: > > *UPDATE*test_large d > *SET* grp = s.grp > *FROM* (*SELECT* ctid, test_large.* > *FROM* test_large > *WHERE* grp = 1) s > *WHERE* d.ctid = s.ctid; Why don't you join on the primary key column? The ctid comparison is typically quite slow. Thomas
This is just a very simplified sample and doesn’t makes sense, but it shows the execution plan. The original query is more complex. It selects more values and updates more columns.
Dirk
fon +49 2202 2401-1197 --
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:50
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;
Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.
BR
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Attachment
Hi,
The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
Since the SELECT statement only returns records whose PK already exists, I thought an UPDATE FROM would be the better choice.
As I said, with the PK as JOIN condition it works fine too. I'm only interested in why no tid scan is used.
Dirk
fon +49 2202 2401-1197 --
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:27
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement
On 1/7/19 12:13 μ.μ., Dirk Mika wrote:
Hello,
I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:
Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"
BR
Dirk
--
Dirk Mika
Software Developer
mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germanyfon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.deAG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Attachment
Dirk Mika schrieb am 01.07.2019 um 12:02: > I know I can join using the pk, but in oracle using the rowid is > faster so I wanted to know, if this is possible in PostgreSQL as > well. Well, in Postgres ctid is not necessarily faster.
Dirk Mika schrieb am 01.07.2019 um 13:18: > The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers. > > In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid. The insert trigger will only be fired if an INSERT actually takes place. If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
I've tried it with the following trigger: CREATE TRIGGER tr_tl_test1 BEFORE INSERT ON public.test_large FOR EACH ROW EXECUTE PROCEDURE tf_tr_tl_test1 () The trigger function does nothing special: CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 () RETURNS trigger LANGUAGE 'plpgsql' VOLATILE NOT LEAKPROOF SECURITY INVOKER PARALLEL UNSAFE AS $$ BEGIN RAISE NOTICE 'Trigger called with: %', new; RETURN new; END; $$ If I do a EXPLAIN ANALYZE INSERT INTO test_large (id) VALUES (2) ON CONFLICT ON CONSTRAINT pk_test_large DO NOTHING; I get the following: NOTICE: Trigger called with: (2,,) QUERY PLAN --------------------------------------------------------------------------------------------------- Insert on test_large (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1) Conflict Resolution: NOTHING Conflict Arbiter Indexes: pk_test_large Tuples Inserted: 0 Conflicting Tuples: 1 -> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1) Planning Time: 0.142 ms Trigger tr_tl_test1: time=0.116 calls=1 Execution Time: 0.180 ms As you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple insertedbut one conflicting. Dirk -- Dirk Mika Software Developer mika:timing GmbH Strundepark - Kürtener Str. 11b 51465 Bergisch Gladbach Germany fon +49 2202 2401-1197 dirk.mika@mikatiming.de www.mikatiming.de AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika ## How2Use ## the ChampionChip by mika:timing ## https://youtu.be/qfOFXrpSKLQ Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>: Dirk Mika schrieb am 01.07.2019 um 13:18: > The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers. > > In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid. The insert trigger will only be fired if an INSERT actually takes place. If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
Hello Dirk, pls don't top post, that's the rule here On 1/7/19 2:40 μ.μ., Dirk Mika wrote: > I've tried it with the following trigger: > > CREATE TRIGGER tr_tl_test1 > BEFORE INSERT > ON public.test_large > FOR EACH ROW > EXECUTE PROCEDURE tf_tr_tl_test1 () > > The trigger function does nothing special: > > CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 () > RETURNS trigger > LANGUAGE 'plpgsql' > VOLATILE > NOT LEAKPROOF > SECURITY INVOKER > PARALLEL UNSAFE > AS > $$ > BEGIN > RAISE NOTICE 'Trigger called with: %', new; > RETURN new; > END; > $$ > > If I do a > > EXPLAIN ANALYZE > INSERT INTO test_large (id) > VALUES (2) > ON CONFLICT > ON CONSTRAINT pk_test_large > DO NOTHING; > > I get the following: > > NOTICE: Trigger called with: (2,,) > QUERY PLAN > --------------------------------------------------------------------------------------------------- > Insert on test_large (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1) > Conflict Resolution: NOTHING > Conflict Arbiter Indexes: pk_test_large > Tuples Inserted: 0 > Conflicting Tuples: 1 > -> Result (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1) > Planning Time: 0.142 ms > Trigger tr_tl_test1: time=0.116 calls=1 > Execution Time: 0.180 ms > > As you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple insertedbut one conflicting. A before trigger cannot know the actual. So a BEFORE INSERT trigger will behave as such even if no INSERT takes place. If you want to skip your particular "normal" trigger , just run : set session_replication_role to 'replica'; and run your upsert. > > Dirk > > > > -- > Dirk Mika > Software Developer > > mika:timing GmbH > Strundepark - Kürtener Str. 11b > 51465 Bergisch Gladbach > Germany > > fon +49 2202 2401-1197 > dirk.mika@mikatiming.de > www.mikatiming.de > > AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 > Geschäftsführer: Harald Mika, Jörg Mika > > > > ## How2Use > ## the ChampionChip by mika:timing > ## https://youtu.be/qfOFXrpSKLQ > Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>: > > Dirk Mika schrieb am 01.07.2019 um 13:18: > > The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers. > > > > In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid. > > The insert trigger will only be fired if an INSERT actually takes place. > > If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired. > > > > > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
> As you can see the trigger function is called for the row I try to > insert, but you can also see that there's no tuple inserted but one > conflicting. Ah, right. Thinking about it, it _has_ to call any BEFORE trigger function as that might change values of the row to be inserted that could possibly change the outcome of the test.
Hi Thinking about it, it _has_ to call any BEFORE trigger function as that might change values of the row to be inserted that could possibly change the outcome of the test. Yeah, that was my thought, too. Unfortunately, the affected trigger changes two columns, so I can't change it to an AFTERROW trigger. Dirk -- Dirk Mika Software Developer mika:timing GmbH Strundepark - Kürtener Str. 11b 51465 Bergisch Gladbach Germany fon +49 2202 2401-1197 dirk.mika@mikatiming.de www.mikatiming.de AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884 Geschäftsführer: Harald Mika, Jörg Mika ## How2Use ## the ChampionChip by mika:timing ## https://youtu.be/qfOFXrpSKLQ
Dirk Mika <Dirk.Mika@mikatiming.de> writes: > I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in PostgreSQLas well. Existing Postgres releases are not very bright about joins on CTID --- basically merge join is the only plan type you can get for that. v12 will improve that somewhat. regards, tom lane