Re: Is SPI + Procedures (with COMMIT) inside a bgworker broken? - Mailing list pgsql-hackers
From | Fabrízio de Royes Mello |
---|---|
Subject | Re: Is SPI + Procedures (with COMMIT) inside a bgworker broken? |
Date | |
Msg-id | CAFcNs+pTiqVbPJD0Vywfb4H7LA7FKaemcWYce8sCcjH9AWKxNQ@mail.gmail.com Whole thread Raw |
In response to | Re: Is SPI + Procedures (with COMMIT) inside a bgworker broken? (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Mon, Sep 13, 2021 at 4:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> The direct cause of that is that SPI_execute() doesn't permit the called
> query to perform COMMIT/ROLLBACK, which is because most callers would fail
> to cope with that. You can instruct SPI to allow that by replacing the
> SPI_execute() call with something like
>
> SPIExecuteOptions options;
>
> ...
> memset(&options, 0, sizeof(options));
> options.allow_nonatomic = true;
>
> ret = SPI_execute_extended(buf.data, &options);
>
I completely forgot about the SPI execute options... Thanks for the explanation!!!
> However, that's not enough to make this example work :-(.
> I find that it still fails inside the procedure's COMMIT,
> with
>
> 2021-09-13 15:14:54.775 EDT worker_spi[476310] ERROR: portal snapshots (0) did not account for all active snapshots (1)
> 2021-09-13 15:14:54.775 EDT worker_spi[476310] CONTEXT: PL/pgSQL function schema4.counted_proc() line 1 at COMMIT
> SQL statement "CALL "schema4"."counted_proc"()"
>
> I think what this indicates is that worker_spi_main's cavalier
> management of the active snapshot isn't up to snuff for this
> use-case. The error is coming from ForgetPortalSnapshots, which
> is expecting that all active snapshots are attached to Portals;
> but that one isn't.
>
> management of the active snapshot isn't up to snuff for this
> use-case. The error is coming from ForgetPortalSnapshots, which
> is expecting that all active snapshots are attached to Portals;
> but that one isn't.
>
That is exactly the root cause of all my investigation.
At Timescale we have a scheduler (background worker) that launches another background worker to "execute a job", and by executing a job it means to call a function [1] or a procedure [2] directly without a SPI.
But now a user raised an issue about snapshots [3] and when I saw the code for the first time I tried to use SPI and it didn't work as expected.
Even tweaking worker_spi to execute the procedure without SPI by calling ExecuteCallStmt (attached) we end up with the same situation about the active snapshots:
2021-09-13 20:14:36.654 -03 [21483] LOG: worker_spi worker 2 initialized with schema2.counted
2021-09-13 20:14:36.655 -03 [21484] LOG: worker_spi worker 1 initialized with schema1.counted
2021-09-13 20:14:36.657 -03 [21483] ERROR: portal snapshots (0) did not account for all active snapshots (1)
2021-09-13 20:14:36.657 -03 [21483] CONTEXT: PL/pgSQL function schema2.counted_proc() line 1 at COMMIT
2021-09-13 20:14:36.657 -03 [21484] ERROR: portal snapshots (0) did not account for all active snapshots (1)
2021-09-13 20:14:36.657 -03 [21484] CONTEXT: PL/pgSQL function schema1.counted_proc() line 1 at COMMIT
2021-09-13 20:14:36.659 -03 [21476] LOG: background worker "worker_spi" (PID 21483) exited with exit code 1
2021-09-13 20:14:36.659 -03 [21476] LOG: background worker "worker_spi" (PID 21484) exited with exit code 1
2021-09-13 20:14:36.655 -03 [21484] LOG: worker_spi worker 1 initialized with schema1.counted
2021-09-13 20:14:36.657 -03 [21483] ERROR: portal snapshots (0) did not account for all active snapshots (1)
2021-09-13 20:14:36.657 -03 [21483] CONTEXT: PL/pgSQL function schema2.counted_proc() line 1 at COMMIT
2021-09-13 20:14:36.657 -03 [21484] ERROR: portal snapshots (0) did not account for all active snapshots (1)
2021-09-13 20:14:36.657 -03 [21484] CONTEXT: PL/pgSQL function schema1.counted_proc() line 1 at COMMIT
2021-09-13 20:14:36.659 -03 [21476] LOG: background worker "worker_spi" (PID 21483) exited with exit code 1
2021-09-13 20:14:36.659 -03 [21476] LOG: background worker "worker_spi" (PID 21484) exited with exit code 1
> Probably the most appropriate fix is to make worker_spi_main
> set up a Portal to run the query inside of. There are other
> bits of code that are not happy if they're not inside a Portal,
> so if you're hoping to run arbitrary SQL this way, sooner or
> later you're going to have to cross that bridge.
>
> set up a Portal to run the query inside of. There are other
> bits of code that are not happy if they're not inside a Portal,
> so if you're hoping to run arbitrary SQL this way, sooner or
> later you're going to have to cross that bridge.
>
I started digging with it [4] by creating a Portal from scratch to execute the Function or Procedure and it worked.
We're wondering if we can avoid the parser for PortalRun, can we??
Regards,
Fabrízio de Royes Mello
Attachment
pgsql-hackers by date: