Thread: Use ctid in where clause in update from statement

Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 09:13:19

Hello,

I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.

This is a simplified version of such a query:

MERGE INTO test_large d

USING (SELECT ROWID, test_large.*

FROM test_large

WHERE grp = 1) s

ON (d.ROWID = s.ROWID)

WHEN MATCHED

THEN

UPDATE SET d.grp = s.grp;

It has the following execution plan:

---------------------------------------------------------------------------------------------

---------------------------------------------------------------------------------------------

| 0 | MERGE STATEMENT | | 1000 | 16000 | 2006 (0)| 00:00:25 |

| 2 | VIEW | | | | | |

| 3 | NESTED LOOPS | | 1000 | 220K| 2006 (0)| 00:00:25 |

| 4 | TABLE ACCESS BY INDEX ROWID| TEST_LARGE | 1000 | 110K| 1006 (0)| 00:00:13 |

|* 5 | INDEX RANGE SCAN | IX_TL_GRP | 1000 | | 6 (0)| 00:00:01 |

| 6 | TABLE ACCESS BY USER ROWID | TEST_LARGE | 1 | 113 | 1 (0)| 00:00:01 |

---------------------------------------------------------------------------------------------

In the PostgreSQL version I use the column ctid for this. The above query becomes:

UPDATE test_large d

SET grp = s.grp

FROM (SELECT ctid, test_large.*

FROM test_large

WHERE grp = 1) s

WHERE d.ctid = s.ctid;

I noticed that the execution plan for such a statement seems to be suboptimal:

Update on test_large d (cost=155346.98..160367.08 rows=996 width=53)

-> Merge Join (cost=155346.98..160367.08 rows=996 width=53)

Merge Cond: (d.ctid = test_large.ctid)

-> Sort (cost=154330.57..156833.16 rows=1001033 width=43)

Sort Key: d.ctid

-> Seq Scan on test_large d (cost=0.00..19366.33 rows=1001033 width=43)

-> Sort (cost=1016.40..1018.89 rows=996 width=10)

Sort Key: test_large.ctid

-> Index Scan using ix_tl_grp on test_large (cost=0.42..966.80 rows=996 width=10)

Index Cond: (grp = 1)

I expected a tid scan to be used instead of a Seq scan. I did a VACUUM test_large to make sure statistics are valid.

I would suspect that a tid scan is even a bit faster than accessing via the primary key, since reading the index is not required.

I know I could change the WHERE clause to use the primary key, which results in the following plan:

Update on test_large d (cost=0.85..3344.12 rows=995 width=57)

-> Nested Loop (cost=0.85..3344.12 rows=995 width=57)

-> Index Scan using ix_tl_grp on test_large (cost=0.42..994.84 rows=995 width=14)

Index Cond: (grp = 1)

-> Index Scan using pk_test_large on test_large d (cost=0.42..2.36 rows=1 width=47)

Index Cond: (id = test_large.id)

So my question is more "Is a tid scan not possible / useful for an update" than "how do I accelerate this query".

Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

Attachment

Re: Use ctid in where clause in update from statement

From

Achilleas Mantzios

Date:

01 July 2019, 09:27:16

On 1/7/19 12:13 μ.μ., Dirk Mika wrote:

Hello,

I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:

Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"

BR
Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Attachment

Re: Use ctid in where clause in update from statement

From

Thomas Kellerer

Date:

01 July 2019, 09:35:49

> I come from the Oracle world and am trying to convert some queries to
> PostgreSQL syntax. One of these queries is a MERGE statement, which I
> converted into an UPDATE SET FROM WHERE construct. In the original
> query I use the pseudo column ROWID to match a source row with a
> target row.
> 
> In the PostgreSQL version I use the column ctid for this. The above query becomes:
> 
> *UPDATE*test_large d
>    *SET* grp = s.grp
>   *FROM* (*SELECT* ctid, test_large.*
>           *FROM* test_large
>          *WHERE* grp = 1) s
>  *WHERE* d.ctid = s.ctid;

Why don't you join on the primary key column? 
The ctid comparison is typically quite slow.

Thomas

Re: Use ctid in where clause in update from statement

From

Achilleas Mantzios

Date:

01 July 2019, 09:50:22

On 1/7/19 12:13 μ.μ., Dirk Mika wrote:

UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;

Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.

BR
Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Attachment

Re: Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 10:02:21

I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in
PostgreSQLas well.
 

Dirk


--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika



## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Am 01.07.19, 11:36 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:

    > I come from the Oracle world and am trying to convert some queries to
    > PostgreSQL syntax. One of these queries is a MERGE statement, which I
    > converted into an UPDATE SET FROM WHERE construct. In the original
    > query I use the pseudo column ROWID to match a source row with a
    > target row.
    >
    > In the PostgreSQL version I use the column ctid for this. The above query becomes:
    >
    > *UPDATE*test_large d
    >    *SET* grp = s.grp
    >   *FROM* (*SELECT* ctid, test_large.*
    >           *FROM* test_large
    >          *WHERE* grp = 1) s
    >  *WHERE* d.ctid = s.ctid;

    Why don't you join on the primary key column?
    The ctid comparison is typically quite slow.

    Thomas

Re: Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 10:04:35

This is just a very simplified sample and doesn’t makes sense, but it shows the execution plan. The original query is more complex. It selects more values and updates more columns.

Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:50
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement

On 1/7/19 12:13 μ.μ., Dirk Mika wrote:

UPDATE test_large d
SET grp = s.grp
FROM (SELECT ctid, test_large.*
FROM test_large
WHERE grp = 1) s
WHERE d.ctid = s.ctid;

Besides, what the above does is select for rows that have grp=1 and then set grp to the same value.

BR
Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

--

Achilleas Mantzios

IT DEV Lead

IT DEPT

Dynacom Tankers Mgmt

Attachment

Re: Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 11:18:33

Hi,

The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.

In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.

Since the SELECT statement only returns records whose PK already exists, I thought an UPDATE FROM would be the better choice.

As I said, with the PK as JOIN condition it works fine too. I'm only interested in why no tid scan is used.

Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

Von: Achilleas Mantzios <achill@matrix.gatewaynet.com>
Datum: Montag, 1. Juli 2019 um 11:27
An: "pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Betreff: Re: Use ctid in where clause in update from statement

On 1/7/19 12:13 μ.μ., Dirk Mika wrote:

Hello,

I come from the Oracle world and am trying to convert some queries to PostgreSQL syntax. One of these queries is a MERGE statement, which I converted into an UPDATE SET FROM WHERE construct. In the original query I use the pseudo column ROWID to match a source row with a target row.
This is a simplified version of such a query:

Postgresql supports upserts : https://www.postgresql.org/docs/11/sql-insert.html "ON CONFLICT"

BR
Dirk

--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany
fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de
AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika

--

Achilleas Mantzios

IT DEV Lead

IT DEPT

Dynacom Tankers Mgmt

Attachment

Re: Use ctid in where clause in update from statement

From

Thomas Kellerer

Date:

01 July 2019, 11:25:22

Dirk Mika schrieb am 01.07.2019 um 12:02:
> I know I can join using the pk, but in oracle using the rowid is
> faster so I wanted to know, if this is possible in PostgreSQL as
> well.
Well, in Postgres  ctid is not necessarily faster.

Re: Use ctid in where clause in update from statement

From

Thomas Kellerer

Date:

01 July 2019, 11:33:06

Dirk Mika schrieb am 01.07.2019 um 13:18:
> The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
> 
> In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.

The insert trigger will only be fired if an INSERT actually takes place. 

If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.

Re: Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 11:40:19

I've tried it with the following trigger:

CREATE TRIGGER tr_tl_test1
   BEFORE INSERT
   ON public.test_large
   FOR EACH ROW
EXECUTE PROCEDURE tf_tr_tl_test1 ()

The trigger function does nothing special:

CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 ()
   RETURNS trigger
   LANGUAGE 'plpgsql'
   VOLATILE
   NOT LEAKPROOF
   SECURITY INVOKER
   PARALLEL UNSAFE
AS
$$
BEGIN
   RAISE NOTICE 'Trigger called with: %', new;
   RETURN new;
END;
$$

If I do a

EXPLAIN ANALYZE
    INSERT INTO test_large (id)
         VALUES (2)
    ON CONFLICT
       ON CONSTRAINT pk_test_large
       DO NOTHING;

I get the following:

NOTICE:  Trigger called with: (2,,)
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Insert on test_large  (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1)
   Conflict Resolution: NOTHING
   Conflict Arbiter Indexes: pk_test_large
   Tuples Inserted: 0
   Conflicting Tuples: 1
   ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.142 ms
 Trigger tr_tl_test1: time=0.116 calls=1
 Execution Time: 0.180 ms

As you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple
insertedbut one conflicting.
 

Dirk



--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika



## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ
Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:

    Dirk Mika schrieb am 01.07.2019 um 13:18:
    > The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
    >
    > In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.

    The insert trigger will only be fired if an INSERT actually takes place.

    If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.

Re: Use ctid in where clause in update from statement

From

Achilleas Mantzios

Date:

01 July 2019, 12:04:59

Hello Dirk,

pls don't top post, that's the rule here

On 1/7/19 2:40 μ.μ., Dirk Mika wrote:
> I've tried it with the following trigger:
>
> CREATE TRIGGER tr_tl_test1
>     BEFORE INSERT
>     ON public.test_large
>     FOR EACH ROW
> EXECUTE PROCEDURE tf_tr_tl_test1 ()
>
> The trigger function does nothing special:
>
> CREATE OR REPLACE FUNCTION public.tf_tr_tl_test1 ()
>     RETURNS trigger
>     LANGUAGE 'plpgsql'
>     VOLATILE
>     NOT LEAKPROOF
>     SECURITY INVOKER
>     PARALLEL UNSAFE
> AS
> $$
> BEGIN
>     RAISE NOTICE 'Trigger called with: %', new;
>     RETURN new;
> END;
> $$
>
> If I do a
>
> EXPLAIN ANALYZE
>      INSERT INTO test_large (id)
>           VALUES (2)
>      ON CONFLICT
>         ON CONSTRAINT pk_test_large
>         DO NOTHING;
>
> I get the following:
>
> NOTICE:  Trigger called with: (2,,)
>                                              QUERY PLAN
> ---------------------------------------------------------------------------------------------------
>   Insert on test_large  (cost=0.00..0.01 rows=1 width=40) (actual time=0.153..0.153 rows=0 loops=1)
>     Conflict Resolution: NOTHING
>     Conflict Arbiter Indexes: pk_test_large
>     Tuples Inserted: 0
>     Conflicting Tuples: 1
>     ->  Result  (cost=0.00..0.01 rows=1 width=40) (actual time=0.003..0.004 rows=1 loops=1)
>   Planning Time: 0.142 ms
>   Trigger tr_tl_test1: time=0.116 calls=1
>   Execution Time: 0.180 ms
>
> As you can see the trigger function is called for the row I try to insert, but you can also see that there's no tuple
insertedbut one conflicting.
 

A before trigger cannot know the actual. So a BEFORE INSERT trigger will behave as such even if no INSERT takes place.
If you want to skip your particular "normal" trigger , just run :
set session_replication_role to 'replica';
and run your upsert.

>
> Dirk
>
>
>
> --
> Dirk Mika
> Software Developer
>
> mika:timing GmbH
> Strundepark - Kürtener Str. 11b
> 51465 Bergisch Gladbach
> Germany
>
> fon +49 2202 2401-1197
> dirk.mika@mikatiming.de
> www.mikatiming.de
>
> AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
> Geschäftsführer: Harald Mika, Jörg Mika
>
>
>
> ## How2Use
> ## the ChampionChip by mika:timing
> ## https://youtu.be/qfOFXrpSKLQ
> Am 01.07.19, 13:33 schrieb "Thomas Kellerer" <spam_eater@gmx.net>:
>
>      Dirk Mika schrieb am 01.07.2019 um 13:18:
>      > The problem with the INSERT ON CONFLICT is that an insert is tried here first, which may fire triggers.
>      >
>      > In my case there is a fairly expensive INSERT Trigger ON EACH ROW, which I would like to avoid.
>
>      The insert trigger will only be fired if an INSERT actually takes place.
>
>      If INSERT ON CONFLICT results in an UPDATE (or a DO NOTHING), no INSERT trigger will be fired.
>
>
>
>
>


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

Re: Use ctid in where clause in update from statement

From

Thomas Kellerer

Date:

01 July 2019, 12:08:20

> As you can see the trigger function is called for the row I try to
> insert, but you can also see that there's no tuple inserted but one
> conflicting.

Ah, right. 

Thinking about it, it _has_ to call any BEFORE trigger function 
as that might change values of the row to be inserted that 
could possibly change the outcome of the test.

Re: Use ctid in where clause in update from statement

From

Dirk Mika

Date:

01 July 2019, 12:13:25

Hi

    Thinking about it, it _has_ to call any BEFORE trigger function
    as that might change values of the row to be inserted that
    could possibly change the outcome of the test.

Yeah, that was my thought, too. Unfortunately, the affected trigger changes two columns, so I can't change it to an
AFTERROW trigger.
 

Dirk







--
Dirk Mika
Software Developer

mika:timing GmbH
Strundepark - Kürtener Str. 11b
51465 Bergisch Gladbach
Germany

fon +49 2202 2401-1197
dirk.mika@mikatiming.de
www.mikatiming.de

AG Köln HRB 47509 * WEEE-Reg.-Nr. DE 90029884
Geschäftsführer: Harald Mika, Jörg Mika



## How2Use
## the ChampionChip by mika:timing
## https://youtu.be/qfOFXrpSKLQ

Re: Use ctid in where clause in update from statement

From

Tom Lane

Date:

01 July 2019, 14:19:39

Dirk Mika <Dirk.Mika@mikatiming.de> writes:
> I know I can join using the pk, but in oracle using the rowid is faster so I wanted to know, if this is possible in
PostgreSQLas well. 

Existing Postgres releases are not very bright about joins on CTID ---
basically merge join is the only plan type you can get for that.
v12 will improve that somewhat.

            regards, tom lane