Thread: Why plpython functions increase transaction counter much more then plpgsql functions?

I'm trying to understand why plpython function has much bigger impact on transaction counter in Postgres than plpgSQL function. Below is example which uses 2 functions:

Version with plpgSQL (each part done in separate transactions one after another)
 - check txid_current
 - SQL query which calls the `f1_plpgsql` function  which calls the `insert_row_to_db` function 100 times
 - check txid_current
 
 Then we compare txid_currnent values and difference is 2 which means that whole sql with 100 calls to `f1_plpgsql` and `insert_row_to_db` increased transaction counter only by 1.
 
Here is the code:
```
CREATE TABLE insert_rows_table(
    i BIGINT
);

CREATE OR REPLACE FUNCTION insert_row_to_db(i BIGINT)
RETURNS VOID
AS $$
BEGIN
    INSERT INTO insert_rows_table SELECT i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;


CREATE OR REPLACE FUNCTION f1_plpgsql(i BIGINT)
  RETURNS bigint
AS $$
BEGIN
    PERFORM insert_row_to_db(i);
    RETURN i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;


SELECT txid_current();
SELECT f1_plpgsql(i::BIGINT) FROM generate_series(1,100) as i;
SELECT txid_current();
```

Example output:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
502


Here is a code reproduction on db-fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15135

Now let's replace `f1_plpgsql` with function written in plpython:

```
CREATE OR REPLACE FUNCTION f1_plpython(i BIGINT)
  RETURNS bigint
AS $$
    rows = plpy.execute("SELECT insert_row_to_db(" + str(i) + ")")
    return i
$$ LANGUAGE plpython3u SECURITY DEFINER VOLATILE PARALLEL UNSAFE;
```

I get:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
602


This proves that the plpython function affects the transaction counter much more. Does anyone know why? Is there anything I can do about it?

What's interesting it happens only if the function called by plpyhon makes changes to DB. When I replace `INSERT INTO insert_rows_table SELECT i;` with `SELECT i` both plpython and plpgsql functions behave the same.
 Regards,

Michał Albrycht
 
Because the plpython function is executing dynamic SQL?

On Fri, Nov 8, 2024 at 2:59 AM Michał Albrycht <michalalbrycht@gmail.com> wrote:
I'm trying to understand why plpython function has much bigger impact on transaction counter in Postgres than plpgSQL function. Below is example which uses 2 functions:

Version with plpgSQL (each part done in separate transactions one after another)
 - check txid_current
 - SQL query which calls the `f1_plpgsql` function  which calls the `insert_row_to_db` function 100 times
 - check txid_current
 
 Then we compare txid_currnent values and difference is 2 which means that whole sql with 100 calls to `f1_plpgsql` and `insert_row_to_db` increased transaction counter only by 1.
 
Here is the code:
```
CREATE TABLE insert_rows_table(
    i BIGINT
);

CREATE OR REPLACE FUNCTION insert_row_to_db(i BIGINT)
RETURNS VOID
AS $$
BEGIN
    INSERT INTO insert_rows_table SELECT i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;


CREATE OR REPLACE FUNCTION f1_plpgsql(i BIGINT)
  RETURNS bigint
AS $$
BEGIN
    PERFORM insert_row_to_db(i);
    RETURN i;
END
$$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE PARALLEL UNSAFE;


SELECT txid_current();
SELECT f1_plpgsql(i::BIGINT) FROM generate_series(1,100) as i;
SELECT txid_current();
```

Example output:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
502


Here is a code reproduction on db-fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15135

Now let's replace `f1_plpgsql` with function written in plpython:

```
CREATE OR REPLACE FUNCTION f1_plpython(i BIGINT)
  RETURNS bigint
AS $$
    rows = plpy.execute("SELECT insert_row_to_db(" + str(i) + ")")
    return i
$$ LANGUAGE plpython3u SECURITY DEFINER VOLATILE PARALLEL UNSAFE;
```

I get:

txid_current
500

f1_plpgsql
1
2
...
99
100

txid_current
602


This proves that the plpython function affects the transaction counter much more. Does anyone know why? Is there anything I can do about it?

What's interesting it happens only if the function called by plpyhon makes changes to DB. When I replace `INSERT INTO insert_rows_table SELECT i;` with `SELECT i` both plpython and plpgsql functions behave the same.
 Regards,

Michał Albrycht
 


--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
=?UTF-8?Q?Micha=C5=82_Albrycht?= <michalalbrycht@gmail.com> writes:
> This proves that the plpython function affects the transaction counter much
> more. Does anyone know why?

A quick look at PLy_spi_execute_query shows that it runs each command
in a subtransaction.  It pretty much has to, because the coding rules
for a Python method don't permit it to just longjmp out of the Python
interpreter, so it has to set up a subtransaction so it can catch any
error.  In this example, each subtransaction will consume an XID.

The plpgsql example is different because it doesn't trap errors,
hence no subtransaction needed, and all the rows will get inserted
under the XID of the outer command's main transaction.  If you were
to wrap the insert_row_to_db call in BEGIN ... EXCEPTION then it'd
consume the same number of XIDs as plpython, for the same reason.

> Is there anything I can do about it?

Batch the DB updates, perhaps?

> What's interesting it happens only if the function called by plpyhon makes
> changes to DB.

Totally unsurprising.  XIDs are acquired only when the current
transaction or subtransaction first needs to change the DB.

            regards, tom lane



A quick look at PLy_spi_execute_query shows that it runs each command
in a subtransaction.  It pretty much has to, because the coding rules
for a Python method don't permit it to just longjmp out of the Python
interpreter, so it has to set up a subtransaction so it can catch any
error.  In this example, each subtransaction will consume an XID.

That makes sense. Thank you very much.
Regards,

Michał Albrycht