Thread: Update concurrency

Update concurrency

From
Sylvain Déve
Date:
Hello,

On a server, I have an API which is called by one or multiple clients. The server can receive, almost at the same time,
multipleupdate instructions for the same PostgreSQL attribute. These requests do not arrive at the exact same time, and
Iwould just like to execute them one after another. Because of the (Flask) API, I cannot control that a previous call
isover, so the new update can be called before the previous one is completed. I use psycopg2 to interface with
PostgreSQL,and I would except psycopg2/PostgreSQL to be able to queue the updates and proceed them quietly one after
another(maybe not even in order), but it does not seem to work that way. 

Below is a simplified version of the function called on the server when an update is required. The attribute to be
updatedis a JSONB object, if that matters. Hence there is 'path' to determine which part of the JSONB object to update.
Icopy the Python code, not only the PostgreSQL code, because it might be part of the problem. So the API call looks
likethis: 

def pg_update(data, path):
    conn = psycopg2.connect(...) # always the same database.
    cur = conn.cursor()

    # JSONB update for a single selected row and column.
    # 'column' and 'select_row' are determined with the parameters 'data' and 'path'.
    command = (
        f"""UPDATE MY_TABLE SET """
        f"""{column} = jsonb_merge({column}, %s) """
        f"""WHERE {select_row};"""
    )

    cur.execute(command, [Json(data)])
    conn.commit()

When called twice in a row, this call leads to the error (at "cur.execute"):

    psycopg2.errors.InternalError_: tuple concurrently updated

Since the server keeps running all the time, I can also define conn outside the API call "pg_update" so that all calls
areprocessed with the same psycopg2 connection. But then, when two updates are requested for the same attribute, I get
instead:

    psycopg2.ProgrammingError: execute cannot be used while an asynchronous query is underway

I forced async_ = False in psycopg2.connect, just in case. No change.

I also tried to lock like that:
    command = (
        f"""BEGIN; """
        f"""SELECT * FROM MY_TABLE WHERE {select_row} FOR UPDATE; """
        f"""UPDATE MY_TABLE SET """
        f"""{column} = jsonb_merge({column}, %s) """
        f"""WHERE {select_row};"""
        f"""END;"""
    )

To summarize, how can I allow for multiple updates of the same attributes, one after another, even when these updates
arerequested almost at the same time from independent, uncontrollable client requests? Maybe there is a need for a
betterlocking mechanism? A queuing mechanism? 

Thanks for your help!



Re: Update concurrency

From
"David G. Johnston"
Date:
On Monday, December 20, 2021, Sylvain Déve <sylvain.deve@protonmail.com> wrote:

To summarize, how can I allow for multiple updates of the same attributes, one after another, even when these updates are requested almost at the same time from independent, uncontrollable client requests?

Most people use an application connect pool.  You replace “connect” with a call to get a connection from the pool.  When you are done you release the connect back to the pool.  Concurrency is then done by having a pool size greater than one.  If you try doing more concurrent work than the pool size allows the pool call should block and/lr return a nothing upon which you loop and try again.

David J.

Re: Update concurrency

From
Tom Lane
Date:
=?utf-8?Q?Sylvain_D=C3=A9ve?= <sylvain.deve@protonmail.com> writes:
>     # JSONB update for a single selected row and column.
>     # 'column' and 'select_row' are determined with the parameters 'data' and 'path'.
>     command = (
>         f"""UPDATE MY_TABLE SET """
>         f"""{column} = jsonb_merge({column}, %s) """
>         f"""WHERE {select_row};"""
>     )

> When called twice in a row, this call leads to the error (at "cur.execute"):
>     psycopg2.errors.InternalError_: tuple concurrently updated

That error should be pretty hard to reach from user code, and certainly
two successive UPDATEs ought not have anything to do with it.  I think
there is something you're not telling us about the context.  Does this
table have any triggers, weird constraints, or the like?  What is the
actual WHERE clause (ie, I wonder if you have simplified away some
relevant query detail)?  What PG version is this exactly?

            regards, tom lane



Re: Update concurrency

From
Tom Lane
Date:
[ please keep the list cc'd ]

=?utf-8?Q?Sylvain_D=C3=A9ve?= <sylvain.deve@protonmail.com> writes:
> Indeed I removed the important part here... I was including a function definition ("create or replace function ...")
inthe call too. This was temporary and dirty. After moving the definition of the function to the initialization of the
database,it solved everything... Defining the same function multiple times, and I presume more or less at the same
time,led to problems. The table update is carried out finally without any problem... 

Hah, now I can reproduce it:

regression=# create or replace function foo(int) returns int as 'select 1' language sql;
CREATE FUNCTION
regression=# begin;
BEGIN
regression=*# create or replace function foo(int) returns int as 'select 1' language sql;
CREATE FUNCTION

... in another session:

regression=# create or replace function foo(int) returns int as 'select 1' language sql;
<<blocks>>

... in first session:

regression=*# commit;
COMMIT

and now the second session fails with

ERROR:  tuple concurrently updated

because both transactions are trying to update the same pre-existing
row of pg_proc.  (If the function didn't exist to start with, then
you get "duplicate key value violates unique constraint" instead.)

That's basically because internal catalog manipulations don't go
to the same lengths as user queries do to handle concurrent-update
scenarios nicely.  I'm not sure what would be involved in making
that better, but I am sure it'd be a lot of work :-(

            regards, tom lane