Re: Conflicting updates of command progress - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Conflicting updates of command progress
Date
Msg-id CAA5RZ0v5tYwsdSxQaFUh0xkHPnY62m55qZ=a7k6EewbP-zCMvA@mail.gmail.com
Whole thread Raw
In response to Re: Conflicting updates of command progress  (Antonin Houska <ah@cybertec.at>)
Responses Re: Conflicting updates of command progress
List pgsql-hackers
>> pgstat_progress_start_command should only be called once by the entry
>> point for the
>> command. In theory, we could end up in a situation where start_command
>> is called multiple times during the same top-level command;

> Not only in theory - it actually happens when CLUSTER is rebuilding indexes.

In the case of CLUSTER, pgstat_progress_start_command is only called once,
but pgstat_progress_update_param is called in the context of both CLUSTER
and CREATE INDEX.

> That's a possible approach. However, if the index build takes long time in the
> CLUSTER case, the user will probably be interested in details about the index
> build.

I agree,

>> Is there a repro that you can share that shows the weird values? It sounds like
>> the repro is on top of [1]. Is that right?

>> You can reproduce the similar problem by creating a trigger function that
>> runs a progress-reporting command like COPY, and then COPY data into
>> a table that uses that trigger.

>> [2] https://commitfest.postgresql.org/patch/5282/

In this case, pgstat_progress_start_command is actually called
twice in the life of a single COPY command; the upper-level COPY
command calls pgstat_progress_start_command and then the nested COPY
command also does calls pgstat_progress_start_command.

> I think that can be implemented by moving the progress related fields from
> PgBackendStatus into a new structure and by teaching the backend to insert a
> new instance of that structure into a shared hash table (dshash.c)

I think this is a good idea in general to move the backend progress to
shared memory.
and with a new API that will deal with scenarios as described above.
1/ an (explicit) nested
command was started by a top-level command, such as the COPY case above.
2/ a top-level command triggered some other progress code implicitly, such as
CLUSTER triggering CREATE INDEX code.

I also like the shared memory approach because we can then not have to use
a message like the one introduced in f1889729dd3ab0 to support parallel index
vacuum progress 46ebdfe164c61.


-- 
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: Jacob Champion
Date:
Subject: Re: [PATCH] Support older Pythons in oauth_server.py