Thread: Re: pgsql: Harden new test case against force_parallel_mode = regress.

Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Robert Haas
Date:
On Thu, Mar 2, 2023 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Harden new test case against force_parallel_mode = regress.
>
> Per buildfarm: worker processes can't see a role created in
> the current transaction.

Now why would that happen? Surely the snapshot for each command is
passed down from leader to worker, and the worker is not free to
invent a snapshot from nothing.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 2, 2023 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Per buildfarm: worker processes can't see a role created in
>> the current transaction.

> Now why would that happen? Surely the snapshot for each command is
> passed down from leader to worker, and the worker is not free to
> invent a snapshot from nothing.

The workers were failing at startup, eg (from [1]):

+ERROR:  role "regress_psql_user" does not exist
+CONTEXT:  while setting parameter "session_authorization" to "regress_psql_user"

Maybe this says that worker startup needs to install the snapshot before
doing any catalog accesses?  Anyway, I'd be happy to revert this test
hack if you care to make the case work.

            regards, tom lane



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Matthias van de Meent
Date:
On Fri, 3 Mar 2023 at 17:16, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 2, 2023 at 5:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Harden new test case against force_parallel_mode = regress.
> >
> > Per buildfarm: worker processes can't see a role created in
> > the current transaction.
>
> Now why would that happen? Surely the snapshot for each command is
> passed down from leader to worker, and the worker is not free to
> invent a snapshot from nothing.

Probably because we nitialize which user and database to use in the
backend before we load the parent process' snapshot:

in ParallelWorkerMain (parallel.c, as of HEAD @ b6a0d469):

  /* Restore database connection. */
  BackgroundWorkerInitializeConnectionByOid(fps->database_id,
                                  fps->authenticated_user_id,
                                  0);
[...]

  /* Crank up a transaction state appropriate to a parallel worker. */
  tstatespace = shm_toc_lookup(toc, PARALLEL_KEY_TRANSACTION_STATE, false);
  StartParallelWorkerTransaction(tstatespace);

  /* Restore combo CID state. */
  combocidspace = shm_toc_lookup(toc, PARALLEL_KEY_COMBO_CID, false);
  RestoreComboCIDState(combocidspace);

-Matthias



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Tom Lane
Date:
I wrote:
> The workers were failing at startup, eg (from [1]):

argh, forgot to add the link:

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hippopotamus&dt=2023-03-02%2022%3A31%3A17

            regards, tom lane



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Robert Haas
Date:
On Fri, Mar 3, 2023 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The workers were failing at startup, eg (from [1]):
>
> +ERROR:  role "regress_psql_user" does not exist
> +CONTEXT:  while setting parameter "session_authorization" to "regress_psql_user"
>
> Maybe this says that worker startup needs to install the snapshot before
> doing any catalog accesses?  Anyway, I'd be happy to revert this test
> hack if you care to make the case work.

Oh, that's interesting (and sad). A parallel worker has a "startup
transaction" that is used to restore library and GUC state, and then
after that transaction commits, it starts up a new transaction that
uses the same snapshot and settings as the transaction in the parallel
leader. So the problem here is that the startup transaction can't see
the uncommitted work of some unrelated (as far as it knows)
transaction, and that prevents restoring the session_authorization
GUC.

That startup transaction has broken stuff before, and it would be nice
to get rid of it. Unfortunately, I don't remember right now why we
need it in the first place. I'm fairly sure that if you load the
library and GUC state without any transaction, that doesn't work,
because a bunch of important processing gets skipped. And I think if
you try to do those things in the "real" transaction that fails for
some reason too, maybe that there's no guarantee that all the relevant
GUCs can be changed at that point, but I'm fuzzy on the details at the
moment.

So I don't know how to fix this right now, but thanks for the details.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 3, 2023 at 11:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +ERROR:  role "regress_psql_user" does not exist
>> +CONTEXT:  while setting parameter "session_authorization" to "regress_psql_user"

> Oh, that's interesting (and sad). A parallel worker has a "startup
> transaction" that is used to restore library and GUC state, and then
> after that transaction commits, it starts up a new transaction that
> uses the same snapshot and settings as the transaction in the parallel
> leader. So the problem here is that the startup transaction can't see
> the uncommitted work of some unrelated (as far as it knows)
> transaction, and that prevents restoring the session_authorization
> GUC.

Got it.

> That startup transaction has broken stuff before, and it would be nice
> to get rid of it. Unfortunately, I don't remember right now why we
> need it in the first place. I'm fairly sure that if you load the
> library and GUC state without any transaction, that doesn't work,
> because a bunch of important processing gets skipped. And I think if
> you try to do those things in the "real" transaction that fails for
> some reason too, maybe that there's no guarantee that all the relevant
> GUCs can be changed at that point, but I'm fuzzy on the details at the
> moment.

Couldn't we install the leader's snapshot into both transactions?

            regards, tom lane



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Robert Haas
Date:
On Fri, Mar 3, 2023 at 12:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Couldn't we install the leader's snapshot into both transactions?

Yeah, maybe that would Just Work. Not sure.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Harden new test case against force_parallel_mode = regress.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 3, 2023 at 12:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Couldn't we install the leader's snapshot into both transactions?

> Yeah, maybe that would Just Work. Not sure.

Well, IIUC the worker is currently getting a brand new snapshot
for its startup transaction, which is exactly what you said upthread
it should never do.  Seems like that could have more failure modes
than just this one.

            regards, tom lane