Thread: Use array as object (src/fe_utils/parallel_slot.c)
Hi,
One other case suspicious, which I think deserves a conference.
At function wait_on_slots (src/fe_utils/parallel_slot.c)
The variable "slots" are array, but at function call SetCancelConn,
"slots" are used as an object, which at the very least would be suspicious.
cancelconn wouldn't that be the correct argument?
regards,
Ranier Vilela
Attachment
Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Hi,One other case suspicious, which I think deserves a conference.At function wait_on_slots (src/fe_utils/parallel_slot.c)The variable "slots" are array, but at function call SetCancelConn,"slots" are used as an object, which at the very least would be suspicious.
Introduced the affected function.
I'm not sure you're having problems, but using arrays as single pointer is not recommended.
I'm not sure you're having problems, but using arrays as single pointer is not recommended.
Ranier Vilela
On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote: > Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu: > > > Hi, > > > > One other case suspicious, which I think deserves a conference. > > At function wait_on_slots (src/fe_utils/parallel_slot.c) > > The variable "slots" are array, but at function call SetCancelConn, > > "slots" are used as an object, which at the very least would be suspicious. > > The commit > https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163 > Introduced the affected function. It's true that the function was added there, but SetCancelConn() was called the same way before that: SetCancelConn(slots->connection); If you trace the history back to a17923204, you'll see a comment about the "zeroth slot", which makes it clear that the first slot it what's intended. I agree that it would be clearer if this were written as slots[0].connection. -- Justin
Em sex., 19 de ago. de 2022 às 16:22, Justin Pryzby <pryzby@telsasoft.com> escreveu:
On Fri, Aug 19, 2022 at 03:52:36PM -0300, Ranier Vilela wrote:
> Em qui., 11 de ago. de 2022 às 09:52, Ranier Vilela <ranier.vf@gmail.com> escreveu:
>
> > Hi,
> >
> > One other case suspicious, which I think deserves a conference.
> > At function wait_on_slots (src/fe_utils/parallel_slot.c)
> > The variable "slots" are array, but at function call SetCancelConn,
> > "slots" are used as an object, which at the very least would be suspicious.
>
> The commit
> https://github.com/postgres/postgres/commit/f71519e545a34ece0a27c8bb1a2b6e197d323163
> Introduced the affected function.
It's true that the function was added there, but SetCancelConn() was called the
same way before that: SetCancelConn(slots->connection);
If you trace the history back to a17923204, you'll see a comment about the
"zeroth slot", which makes it clear that the first slot it what's intended.
Thank you Justin, for the research.
I agree that it would be clearer if this were written as slots[0].connection.
But I still think that the new variable introduced, "cancelconn", became the real argument.
regards,
Ranier Vilela
On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote: > If you trace the history back to a17923204, you'll see a comment about the > "zeroth slot", which makes it clear that the first slot it what's intended. > > I agree that it would be clearer if this were written as slots[0].connection. Based on the way the code is written on HEAD, this would be the correct assumption. Now, calling PQgetCancel() would return NULL for a connection that we actually ignore in the code a couple of lines above when it has PGINVALID_SOCKET. So it seems to me that the suggestion of using "cancelconn", which would be the first valid connection, rather than always the first connection, which may be using an invalid socket, is correct, so as we always have our hands on a way to cancel a command. -- Michael
Attachment
Em dom., 21 de ago. de 2022 às 21:15, Michael Paquier <michael@paquier.xyz> escreveu:
On Fri, Aug 19, 2022 at 02:22:32PM -0500, Justin Pryzby wrote:
> If you trace the history back to a17923204, you'll see a comment about the
> "zeroth slot", which makes it clear that the first slot it what's intended.
>
> I agree that it would be clearer if this were written as slots[0].connection.
Based on the way the code is written on HEAD, this would be the
correct assumption. Now, calling PQgetCancel() would return NULL for
a connection that we actually ignore in the code a couple of lines
above when it has PGINVALID_SOCKET. So it seems to me that the
suggestion of using "cancelconn", which would be the first valid
connection, rather than always the first connection, which may be
using an invalid socket, is correct, so as we always have our hands
on a way to cancel a command.
Thanks Michael, for looking at this.
Is it worth creating a commiffest?
regards,
Ranier Vilela
On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote: > Is it worth creating a commiffest? Don't think so, but feel free to create one and mark me as committer if you think that's appropriate. I have marked this thread as something to do soon-ishly, but I am being distracted by life this month. -- Michael
Attachment
Em sáb., 27 de ago. de 2022 às 00:00, Michael Paquier <michael@paquier.xyz> escreveu:
On Fri, Aug 26, 2022 at 01:54:26PM -0300, Ranier Vilela wrote:
> Is it worth creating a commiffest?
Don't think so, but feel free to create one and mark me as committer
if you think that's appropriate. I have marked this thread as
something to do soon-ishly
Hi Michael, I see the commit.
Thanks for the hardest part.
Suspecting something wrong is easy, the difficult thing is to describe why it is wrong.
, but I am being distracted by life this
month.
Glad to know, enjoy.
regards,
Ranier Vilela
Michael Paquier <michael@paquier.xyz> writes: > Based on the way the code is written on HEAD, this would be the > correct assumption. Now, calling PQgetCancel() would return NULL for > a connection that we actually ignore in the code a couple of lines > above when it has PGINVALID_SOCKET. So it seems to me that the > suggestion of using "cancelconn", which would be the first valid > connection, rather than always the first connection, which may be > using an invalid socket, is correct, so as we always have our hands > on a way to cancel a command. I came across this commit (52144b6fc) while writing release notes, and I have to seriously question whether it's right yet. The thing that needs to be asked is, if we get a SIGINT in a program using this logic, why would we propagate a cancel to just one of the controlled sessions and not all of them? It looks to me like the original concept was that slot zero would be a "master" connection, such that canceling just that one would have a useful effect. Maybe the current users of parallel_slot.c still use it like that, but I bet it's more likely that the connections are all doing independent things and you really gotta cancel them all if you want out. I suppose maybe this commit improved matters: if you are running N jobs then typing control-C N times (not too quickly) might eventually get you out, by successively canceling the lowest-numbered surviving connection. Previously you could have pounded the key all day and not gotten rid of any but the zero'th task. OTOH, if the connections don't exit but just go back to idle, which seems pretty possible, then it's not clear we've moved the needle at all. Anyway I think this needs rewritten, not just tweaked. The cancel.c infrastructure is really next to useless here since it is only designed with one connection in mind. I'm inclined to think we should only expect the signal handler to set CancelRequested, and then manually issue cancels to all live connections when we see that become set. I'm not proposing reverting 52144b6fc, because I doubt it made anything worse; but I'm thinking of leaving it out of the release notes, because I'm unsure it had any user-visible effect at all. It doesn't look to me like we'd ever get to wait_on_slots unless all the connections are known busy. regards, tom lane