Thread: Re: possible bug not in open items

Re: possible bug not in open items

From
Jeff Davis
Date:
On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:
> > http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php
> >
> > It may or may not be a real bug, but I didn't receive any response. If
> > you think it might be a bug, can you please add it to the open items?
>
> Hmm, odd I don't have it either;  can you repost it?

The docs say:

"SIGINT -- The server disallows new connections and sends all existing
server processes SIGTERM, which will cause them to abort their current
transactions and exit promptly."

http://www.postgresql.org/docs/8.3/static/server-shutdown.html

If you have an open COPY and no data is moving, it simply won't
terminate it. You can terminate it with ctrl-C from psql, but not a
SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Regards,
    Jeff Davis

Re: possible bug not in open items

From
Heikki Linnakangas
Date:
Jeff Davis wrote:
> The docs say:
>
> "SIGINT -- The server disallows new connections and sends all existing
> server processes SIGTERM, which will cause them to abort their current
> transactions and exit promptly."
>
> http://www.postgresql.org/docs/8.3/static/server-shutdown.html
>
> If you have an open COPY and no data is moving, it simply won't
> terminate it. You can terminate it with ctrl-C from psql, but not a
> SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

I actually started to looked at this when you posted, but was pre-empted
with something else. Looking at the code, there's comments that actually
use COPY FROM STDIN as an example of things that should not be
interrupted by signals:

>         /*
>          * (2) Allow asynchronous signals to be executed immediately if they
>          * come in while we are waiting for client input. (This must be
>          * conditional since we don't want, say, reads on behalf of COPY FROM
>          * STDIN doing the same thing.)
>          */
>         QueryCancelPending = false;        /* forget any earlier CANCEL signal */
>         DoingCommandRead = true;
>
>         /*
>          * (3) read a command (loop blocks here)
>          */
>         firstchar = ReadCommand(&input_message);

But in light of your bug report, that doesn't seem quite right. We don't
want to enable notify or catchup interrupts during COPY FROM, but we
should react to fast shutdown and query cancel.

I'm not too familiar with this code, but I think we could just enable
ImmediateInterruptOK in CopyGetData().

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: possible bug not in open items

From
Heikki Linnakangas
Date:
Jeff Davis wrote:
> "SIGINT -- The server disallows new connections and sends all existing
> server processes SIGTERM, which will cause them to abort their current
> transactions and exit promptly."
>
> http://www.postgresql.org/docs/8.3/static/server-shutdown.html
>
> If you have an open COPY and no data is moving, it simply won't
> terminate it. You can terminate it with ctrl-C from psql, but not a
> SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.

Tracking and grepping for pq_get* functions, there's one more place that
does a blocking read like that: reading the function oid and args in a
fastpath function call. Using v2 protocol. That has got to be deprecated
enough to not worry about :-). Then again, it wouldn't be hard to put
set ImmediateInterruptOK there as well, for the sake of completeness.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: possible bug not in open items

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> I'm not too familiar with this code, but I think we could just enable
> ImmediateInterruptOK in CopyGetData().

Only if you are wanting to break things.

The reason we don't allow client read to be interrupted is the fear of
losing protocol sync on an incomplete message.  For the SIGTERM case
this would (probably) be okay, since we aren't going to pay any more
attention to the client anyway, but accepting SIGINT there is right out.

            regards, tom lane

Re: possible bug not in open items

From
Jeff Davis
Date:
On Fri, 2009-03-27 at 15:43 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > I'm not too familiar with this code, but I think we could just enable
> > ImmediateInterruptOK in CopyGetData().
>
> Only if you are wanting to break things.
>
> The reason we don't allow client read to be interrupted is the fear of
> losing protocol sync on an incomplete message.  For the SIGTERM case
> this would (probably) be okay, since we aren't going to pay any more
> attention to the client anyway, but accepting SIGINT there is right out.
>

That's perfectly acceptable to me. I'm only concerned about the shutdown
case, and that's the only case that's in conflict with the docs.

Regards,
    Jeff Davis

Re: possible bug not in open items

From
Jeff Davis
Date:
On Fri, 2009-03-27 at 15:43 -0400, Tom Lane wrote:
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> > I'm not too familiar with this code, but I think we could just enable
> > ImmediateInterruptOK in CopyGetData().
>
> Only if you are wanting to break things.
>

Doesn't DoingCommandRead protect us in the SIGINT case?

Regards,
    Jeff Davis

Re: possible bug not in open items

From
Bruce Momjian
Date:
Where are we on this?

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > "SIGINT -- The server disallows new connections and sends all existing
> > server processes SIGTERM, which will cause them to abort their current
> > transactions and exit promptly."
> >
> > http://www.postgresql.org/docs/8.3/static/server-shutdown.html
> >
> > If you have an open COPY and no data is moving, it simply won't
> > terminate it. You can terminate it with ctrl-C from psql, but not a
> > SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.
>
> Tracking and grepping for pq_get* functions, there's one more place that
> does a blocking read like that: reading the function oid and args in a
> fastpath function call. Using v2 protocol. That has got to be deprecated
> enough to not worry about :-). Then again, it wouldn't be hard to put
> set ImmediateInterruptOK there as well, for the sake of completeness.
>
> --
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: possible bug not in open items

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> Where are we on this?

Pretty much nowhere --- there's no proposed patch, and I don't think
it's exactly trivial.  Do you want to put it on TODO?

            regards, tom lane

Re: possible bug not in open items

From
Jeff Davis
Date:
On Thu, 2009-04-09 at 12:59 -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Where are we on this?
>
> Pretty much nowhere --- there's no proposed patch, and I don't think
> it's exactly trivial.  Do you want to put it on TODO?

Here is a patch that does what I think Heikki was suggesting. If a
proper fix is non-trivial, then I assume there's some problem with my
patch, but I'll post it for the archives anyway.

I don't see any obvious protocol synchronization problem, and it looks
like DoingCommandRead protects against that in the case of SIGINT to a
backend. And in the case of SIGTERM to a backend, the connection will be
terminated anyway.

Regards,
    Jeff Davis

Attachment

Re: possible bug not in open items

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Here is a patch that does what I think Heikki was suggesting. If a
> proper fix is non-trivial, then I assume there's some problem with my
> patch, but I'll post it for the archives anyway.

This patch is so wrong that it's scary.  You can't have
ImmediateInterruptOK true over the duration of any significant amount of
backend processing --- as an example, if you take control away in the
middle of a malloc call, you'll probably be left with a corrupt malloc
arena.

It doesn't even work to try to take control away while control is
inside, say, the OpenSSL code.  The reason we have the
prepare_for_client_read/client_read_ended hooks is to allow the flag to
be turned on over a sufficiently narrow scope --- to wit, the recv()
kernel call and nothing else --- that it's safe.

AFAICS, a safe patch for this has got to involve teaching those hooks
about COPY mode.

            regards, tom lane

Re: possible bug not in open items

From
Jeff Davis
Date:
On Fri, 2009-04-10 at 14:47 -0400, Tom Lane wrote:
> This patch is so wrong that it's scary.  You can't have
> ImmediateInterruptOK true over the duration of any significant amount of
> backend processing --- as an example, if you take control away in the
> middle of a malloc call, you'll probably be left with a corrupt malloc
> arena.
>

Thank you for the explanation. My initial thinking was that either
DoingCommandRead would protect us (for SIGINT to the backend), or we
were going to terminate the process anyway (for SIGTERM). But it sounds
like it leaves us in a state so unsafe that we can't even abort the
transaction nicely.

Regards,
    Jeff Davis

Re: possible bug not in open items

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Thank you for the explanation. My initial thinking was that either
> DoingCommandRead would protect us (for SIGINT to the backend), or we
> were going to terminate the process anyway (for SIGTERM). But it sounds
> like it leaves us in a state so unsafe that we can't even abort the
> transaction nicely.

Well, we could presumably do exit(1) regardless.  But if the idea is to
have a clean shutdown, you have to get through proc_exit(), and that
requires essentially all the backend subsystems to be alive and
undamaged.

            regards, tom lane

Re: possible bug not in open items

From
Bruce Momjian
Date:
Was this ever addressed?

---------------------------------------------------------------------------

Jeff Davis wrote:
> On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:
> > > http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php
> > >
> > > It may or may not be a real bug, but I didn't receive any response. If
> > > you think it might be a bug, can you please add it to the open items?
> >
> > Hmm, odd I don't have it either;  can you repost it?
>
> The docs say:
>
> "SIGINT -- The server disallows new connections and sends all existing
> server processes SIGTERM, which will cause them to abort their current
> transactions and exit promptly."
>
> http://www.postgresql.org/docs/8.3/static/server-shutdown.html
>
> If you have an open COPY and no data is moving, it simply won't
> terminate it. You can terminate it with ctrl-C from psql, but not a
> SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.
>
> Regards,
>     Jeff Davis

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

Re: possible bug not in open items

From
Jeff Davis
Date:
On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:
> Was this ever addressed?
>

It doesn't appear to be fixed, and I don't see it on the TODO, either.
Should we add it there?

Regards,
    Jeff Davis

> ---------------------------------------------------------------------------
>
> Jeff Davis wrote:
> > On Thu, 2009-03-26 at 21:45 -0400, Bruce Momjian wrote:
> > > > http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php
> > > >
> > > > It may or may not be a real bug, but I didn't receive any response. If
> > > > you think it might be a bug, can you please add it to the open items?
> > >
> > > Hmm, odd I don't have it either;  can you repost it?
> >
> > The docs say:
> >
> > "SIGINT -- The server disallows new connections and sends all existing
> > server processes SIGTERM, which will cause them to abort their current
> > transactions and exit promptly."
> >
> > http://www.postgresql.org/docs/8.3/static/server-shutdown.html
> >
> > If you have an open COPY and no data is moving, it simply won't
> > terminate it. You can terminate it with ctrl-C from psql, but not a
> > SIGINT to the postmaster or a SIGINT or SIGTERM to the backend.
> >
> > Regards,
> >     Jeff Davis
>

Re: possible bug not in open items

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:
>> Was this ever addressed?

> It doesn't appear to be fixed, and I don't see it on the TODO, either.
> Should we add it there?

+1.  It likely wouldn't be real hard to fix, but given the lack of field
complaints I'm not thinking we need to treat it as urgent.

            regards, tom lane

Re: possible bug not in open items

From
Bruce Momjian
Date:
Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > On Thu, 2010-02-25 at 23:15 -0500, Bruce Momjian wrote:
> >> Was this ever addressed?
>
> > It doesn't appear to be fixed, and I don't see it on the TODO, either.
> > Should we add it there?
>
> +1.  It likely wouldn't be real hard to fix, but given the lack of field
> complaints I'm not thinking we need to treat it as urgent.

Added to TODO:

    Allow a stalled COPY to exit if the backend is terminated

        * http://archives.postgresql.org/pgsql-bugs/2009-04/msg00067.php

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do