Thread: Use array as object (src/fe_utils/parallel_slot.c)

Use array as object (src/fe_utils/parallel_slot.c)

From
Ranier Vilela
Date:
Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.

cancelconn wouldn't that be the correct argument?

regards,
Ranier Vilela
Attachment

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Ranier Vilela
Date:
Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Hi,

One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.
Introduced the affected function.
I'm not sure you're having problems, but using arrays as single pointer is not recommended.

regards,
Ranier Vilela

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Justin Pryzby
Date:
On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote:
> Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:
> 
> > Hi,
> >
> > One other case suspicious, which I think deserves a conference.
> > At function wait_on_slots (src/fe_utils/parallel_slot.c)
> > The variable "slots" are array, but at function call SetCancelConn,
> > "slots" are used as an object, which at the very least would be suspicious.
>
> The commit
> https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163
> Introduced the affected function.

It's true that the function was added there, but SetCancelConn() was called the
same way before that: SetCancelConn(slots->connection);

If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.

I agree that it would be clearer if this were written as slots[0].connection.

-- 
Justin



Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Ranier Vilela
Date:
Em sex., 19 de ago. de 2022 às 16:22, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote:
> Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:
>
> > Hi,
> >
> > One other case suspicious, which I think deserves a conference.
> > At function wait_on_slots (src/fe_utils/parallel_slot.c)
> > The variable "slots" are array, but at function call SetCancelConn,
> > "slots" are used as an object, which at the very least would be suspicious.
>
> The commit
> https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163
> Introduced the affected function.

It's true that the function was added there, but SetCancelConn() was called the
same way before that: SetCancelConn(slots->connection);

If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.
Thank you Justin, for the research.

I agree that it would be clearer if this were written as slots[0].connection.
But I still think that the new variable introduced,  "cancelconn", became the real argument.

regards,
Ranier Vilela

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Michael Paquier
Date:
On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote:
> If you trace the history back to a17923204, you'll see a comment about the
> "zeroth slot", which makes it clear that the first slot it what's intended.
>
> I agree that it would be clearer if this were written as slots[0].connection.

Based on the way the code is written on HEAD, this would be the
correct assumption.  Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET.  So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.
--
Michael

Attachment

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Ranier Vilela
Date:
Em dom., 21 de ago. de 2022 às 21:15, Michael Paquier <michael@paquier.xyz> escreveu:
On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote:
> If you trace the history back to a17923204, you'll see a comment about the
> "zeroth slot", which makes it clear that the first slot it what's intended.
>
> I agree that it would be clearer if this were written as slots[0].connection.

Based on the way the code is written on HEAD, this would be the
correct assumption.  Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET.  So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.
Thanks Michael, for looking at this.
Is it worth creating a commiffest?

regards,
Ranier Vilela

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Michael Paquier
Date:
On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote:
> Is it worth creating a commiffest?

Don't think so, but feel free to create one and mark me as committer
if you think that's appropriate.  I have marked this thread as
something to do soon-ishly, but I am being distracted by life this
month.
--
Michael

Attachment

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Ranier Vilela
Date:
Em sáb., 27 de ago. de 2022 às 00:00, Michael Paquier <michael@paquier.xyz> escreveu:
On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote:
> Is it worth creating a commiffest?

Don't think so, but feel free to create one and mark me as committer
if you think that's appropriate.  I have marked this thread as
something to do soon-ishly
Hi Michael, I see the commit.
Thanks for the hardest part.
Suspecting something wrong is easy, the difficult thing is to describe why it is wrong.

, but I am being distracted by life this
month.
Glad to know, enjoy.

regards,
Ranier Vilela

Re: Use array as object (src/fe_utils/parallel_slot.c)

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> Based on the way the code is written on HEAD, this would be the
> correct assumption.  Now, calling PQgetCancel() would return NULL for
> a connection that we actually ignore in the code a couple of lines
> above when it has PGINVALID_SOCKET.  So it seems to me that the
> suggestion of using "cancelconn", which would be the first valid
> connection, rather than always the first connection, which may be
> using an invalid socket, is correct, so as we always have our hands
> on a way to cancel a command.

I came across this commit (52144b6fc) while writing release notes,
and I have to seriously question whether it's right yet.  The thing
that needs to be asked is, if we get a SIGINT in a program using this
logic, why would we propagate a cancel to just one of the controlled
sessions and not all of them?

It looks to me like the original concept was that slot zero would be
a "master" connection, such that canceling just that one would have a
useful effect.  Maybe the current users of parallel_slot.c still use
it like that, but I bet it's more likely that the connections are
all doing independent things and you really gotta cancel them all
if you want out.

I suppose maybe this commit improved matters: if you are running N jobs
then typing control-C N times (not too quickly) might eventually get
you out, by successively canceling the lowest-numbered surviving
connection.  Previously you could have pounded the key all day and
not gotten rid of any but the zero'th task.  OTOH, if the connections
don't exit but just go back to idle, which seems pretty possible,
then it's not clear we've moved the needle at all.

Anyway I think this needs rewritten, not just tweaked.  The cancel.c
infrastructure is really next to useless here since it is only designed
with one connection in mind.  I'm inclined to think we should only
expect the signal handler to set CancelRequested, and then manually
issue cancels to all live connections when we see that become set.

I'm not proposing reverting 52144b6fc, because I doubt it made
anything worse; but I'm thinking of leaving it out of the release
notes, because I'm unsure it had any user-visible effect at all.
It doesn't look to me like we'd ever get to wait_on_slots unless
all the connections are known busy.

            regards, tom lane